浏览代码

Merge pull request #1003 from dericp/develop-curriculum-learning-rework

Curriculum learning now supports multiple brains.
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
322d2bbe
共有 21 个文件被更改,包括 536 次插入204 次删除
  1. 5
      .gitignore
  2. 124
      docs/Training-Curriculum-Learning.md
  3. 53
      python/tests/test_unitytrainers.py
  4. 4
      python/unityagents/environment.py
  5. 1
      python/unitytrainers/__init__.py
  6. 107
      python/unitytrainers/curriculum.py
  7. 5
      python/unitytrainers/exception.py
  8. 8
      python/unitytrainers/trainer.py
  9. 89
      python/unitytrainers/trainer_controller.py
  10. 3
      python/curricula/wall-jump/BigWallBrain.json
  11. 93
      python/tests/test_curriculum.py
  12. 109
      python/tests/test_meta_curriculum.py
  13. 105
      python/unitytrainers/meta_curriculum.py
  14. 12
      python/curricula/push-block/PushBlockBrain.json
  15. 10
      python/curricula/wall-jump/SmallWallBrain.json
  16. 12
      python/curricula/push.json
  17. 0
      /python/curricula/test/TestBrain.json
  18. 0
      /python/curricula/wall-jump/BigWallBrain.json

5
.gitignore


*.eggs*
*.gitignore.swp
# VSCode hidden files
*.vscode/
.DS_Store
# pytest cache
*.pytest_cache/

124
docs/Training-Curriculum-Learning.md


## Sample Environment
Imagine a task in which an agent needs to scale a wall to arrive at a goal. The starting
point when training an agent to accomplish this task will be a random policy. That
starting policy will have the agent running in circles, and will likely never, or very
rarely scale the wall properly to the achieve the reward. If we start with a simpler
task, such as moving toward an unobstructed goal, then the agent can easily learn to
accomplish the task. From there, we can slowly add to the difficulty of the task by
increasing the size of the wall, until the agent can complete the initially
near-impossible task of scaling the wall. We are including just such an environment with
the ML-Agents toolkit 0.2, called Wall Jump.
Imagine a task in which an agent needs to scale a wall to arrive at a goal. The
starting point when training an agent to accomplish this task will be a random
policy. That starting policy will have the agent running in circles, and will
likely never, or very rarely scale the wall properly to the achieve the reward.
If we start with a simpler task, such as moving toward an unobstructed goal,
then the agent can easily learn to accomplish the task. From there, we can
slowly add to the difficulty of the task by increasing the size of the wall,
until the agent can complete the initially near-impossible task of scaling the
wall. We are including just such an environment with the ML-Agents toolkit 0.2,
called __Wall Jump__.
_Demonstration of a curriculum training scenario in which a progressively taller wall
obstructs the path to the goal._
To see this in action, observe the two learning curves below. Each displays the reward
over time for an agent trained using PPO with the same set of training hyperparameters.
The difference is that one agent was trained using the full-height wall
version of the task, and the other agent was trained using the curriculum version of
the task. As you can see, without using curriculum learning the agent has a lot of
difficulty. We think that by using well-crafted curricula, agents trained using
reinforcement learning will be able to accomplish tasks otherwise much more difficult.
_Demonstration of a curriculum training scenario in which a progressively taller
wall obstructs the path to the goal._
To see this in action, observe the two learning curves below. Each displays the
reward over time for an agent trained using PPO with the same set of training
hyperparameters. The difference is that one agent was trained using the
full-height wall version of the task, and the other agent was trained using the
curriculum version of the task. As you can see, without using curriculum
learning the agent has a lot of difficulty. We think that by using well-crafted
curricula, agents trained using reinforcement learning will be able to
accomplish tasks otherwise much more difficult.
So how does it work? In order to define a curriculum, the first step is to decide which
parameters of the environment will vary. In the case of the Wall Area environment, what
varies is the height of the wall. We can define this as a reset parameter in the Academy
object of our scene, and by doing so it becomes adjustable via the Python API. Rather
than adjusting it by hand, we then create a simple JSON file which describes the
structure of the curriculum. Within it we can set at what points in the training process
our wall height will change, either based on the percentage of training steps which have
taken place, or what the average reward the agent has received in the recent past is.
Once these are in place, we simply launch learn.py using the `–curriculum-file` flag to
point to the JSON file, and PPO we will train using Curriculum Learning. Of course we can
then keep track of the current lesson and progress via TensorBoard.
Each Brain in an environment can have a corresponding curriculum. These
curriculums are held in what we call a metacurriculum. A metacurriculum allows
different brains to follow different curriculums within the same environment.
### Specifying a Metacurriculum
We first create a folder inside `python/curricula/` for the environment we want
to use curriculum learning with. For example, if we were creating a
metacurriculum for Wall Jump, we would create the folder
`python/curricula/wall-jump/`. We will place our curriculums inside this folder.
### Specifying a Curriculum
In order to define a curriculum, the first step is to decide which parameters of
the environment will vary. In the case of the Wall Jump environment, what varies
is the height of the wall. We define this as a `Reset Parameter` in the Academy
object of our scene, and by doing so it becomes adjustable via the Python API.
Rather than adjusting it by hand, we will create a JSON file which
describes the structure of the curriculum. Within it, we can specify which
points in the training process our wall height will change, either based on the
percentage of training steps which have taken place, or what the average reward
the agent has received in the recent past is. Below is an example curriculum for
the BigWallBrain in the Wall Jump environment.
```json
{

"signal_smoothing" : true,
"parameters" :
"signal_smoothing" : true,
"parameters" :
"big_wall_max_height" : [4.0, 7.0, 8.0, 8.0],
"small_wall_height" : [1.5, 2.0, 2.5, 4.0]
"big_wall_max_height" : [4.0, 7.0, 8.0, 8.0]
* `reward` - Uses a measure received reward.
* `reward` - Uses a measure received reward.
* `thresholds` (float array) - Points in value of `measure` where lesson should be increased.
* `min_lesson_length` (int) - How many times the progress measure should be reported before
incrementing the lesson.
* `signal_smoothing` (true/false) - Whether to weight the current progress measure by previous values.
* `thresholds` (float array) - Points in value of `measure` where lesson should
be increased.
* `min_lesson_length` (int) - How many times the progress measure should be
reported before incrementing the lesson.
* `signal_smoothing` (true/false) - Whether to weight the current progress
measure by previous values.
* `parameters` (dictionary of key:string, value:float array) - Corresponds to academy reset parameters to control. Length of each array
should be one greater than number of thresholds.
* `parameters` (dictionary of key:string, value:float array) - Corresponds to
academy reset parameters to control. Length of each array should be one
greater than number of thresholds.
Once our curriculum is defined, we have to use the reset parameters we defined
and modify the environment from the agent's `AgentReset()` function. See
[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/unity-environment/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
for an example. Note that if the Academy's __Max Steps__ is not set to some
positive number the environment will never be reset. The Academy must reset
for the environment to reset.
We will save this file into our metacurriculum folder with the name of its
corresponding Brain. For example, in the Wall Jump environment, there are two
brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
the BigWallBrain, we will save `BigWallBrain.json` into
`python/curricula/wall-jump/`.
### Training with a Curriculum
Once we have specified our metacurriculum and curriculums, we can launch
`learn.py` using the `–curriculum` flag to point to the metacurriculum folder
and PPO will train using Curriculum Learning. For example, to train agents in
the Wall Jump environment with curriculum learning, we can run `python learn.py
--curriculum=curricula/wall-jump/ --run-id=wall-jump-curriculum --train`. We can
then keep track of the current lessons and progresses via TensorBoard.

53
python/tests/test_unitytrainers.py


memory_size: 8
''')
dummy_curriculum = json.loads('''{
"measure" : "reward",
"thresholds" : [10, 20, 50],
"min_lesson_length" : 3,
"signal_smoothing" : true,
"parameters" :
{
"param1" : [0.7, 0.5, 0.3, 0.1],
"param2" : [100, 50, 20, 15],
"param3" : [0.2, 0.3, 0.7, 0.9]
}
}''')
bad_curriculum = json.loads('''{
"measure" : "reward",
"thresholds" : [10, 20, 50],
"min_lesson_length" : 3,
"signal_smoothing" : false,
"parameters" :
{
"param1" : [0.7, 0.5, 0.3, 0.1],
"param2" : [100, 50, 20],
"param3" : [0.2, 0.3, 0.7, 0.9]
}
}''')
@mock.patch('unityagents.UnityEnvironment.executable_launcher')
@mock.patch('unityagents.UnityEnvironment.get_communicator')

batch_size=None, training_length=2)
assert len(b.update_buffer['action']) == 10
assert np.array(b.update_buffer['action']).shape == (10, 2, 2)
def test_curriculum():
open_name = '%s.open' % __name__
with mock.patch('json.load') as mock_load:
with mock.patch(open_name, create=True) as mock_open:
mock_open.return_value = 0
mock_load.return_value = bad_curriculum
with pytest.raises(CurriculumError):
Curriculum('tests/test_unityagents.py', {"param1": 1, "param2": 1, "param3": 1})
mock_load.return_value = dummy_curriculum
with pytest.raises(CurriculumError):
Curriculum('tests/test_unityagents.py', {"param1": 1, "param2": 1})
curriculum = Curriculum('tests/test_unityagents.py', {"param1": 1, "param2": 1, "param3": 1})
assert curriculum.get_lesson_number == 0
curriculum.set_lesson_number(1)
assert curriculum.get_lesson_number == 1
curriculum.increment_lesson(10)
assert curriculum.get_lesson_number == 1
curriculum.increment_lesson(30)
curriculum.increment_lesson(30)
assert curriculum.get_lesson_number == 1
assert curriculum.lesson_length == 3
curriculum.increment_lesson(30)
assert curriculum.get_config() == {'param1': 0.3, 'param2': 20, 'param3': 0.7}
assert curriculum.get_config(0) == {"param1": 0.7, "param2": 100, "param3": 0.2}
assert curriculum.lesson_length == 0
assert curriculum.get_lesson_number == 2
if __name__ == '__main__':

4
python/unityagents/environment.py


"""
if config is None:
config = self._resetParameters
elif config != {}:
logger.info("\nAcademy Reset with parameters : \t{0}"
elif config:
logger.info("Academy reset with parameters: {0}"
.format(', '.join([str(x) + ' -> ' + str(config[x]) for x in config])))
for k in config:
if (k in self._resetParameters) and (isinstance(config[k], (int, float))):

1
python/unitytrainers/__init__.py


from .buffer import *
from .curriculum import *
from .meta_curriculum import *
from .models import *
from .trainer_controller import *
from .bc.models import *

107
python/unitytrainers/curriculum.py


import os
import json
from .exception import CurriculumError

logger = logging.getLogger("unityagents")
logger = logging.getLogger('unitytrainers')
class Curriculum(object):

:param default_reset_parameters: Set of reset parameters for environment.
"""
self.lesson_length = 0
self.max_lesson_number = 0
self.measure_type = None
if location is None:
self.data = None
else:
try:
with open(location) as data_file:
self.data = json.load(data_file)
except IOError:
self.max_lesson_num = 0
self.measure = None
self._lesson_num = 0
# The name of the brain should be the basename of the file without the
# extension.
self._brain_name = os.path.basename(location).split('.')[0]
try:
with open(location) as data_file:
self.data = json.load(data_file)
except IOError:
raise CurriculumError(
'The file {0} could not be found.'.format(location))
except UnicodeDecodeError:
raise CurriculumError('There was an error decoding {}'.format(location))
self.smoothing_value = 0
for key in ['parameters', 'measure', 'thresholds',
'min_lesson_length', 'signal_smoothing']:
if key not in self.data:
raise CurriculumError("{0} does not contain a "
"{1} field.".format(location, key))
self.smoothing_value = 0
self.measure = self.data['measure']
self.max_lesson_num = len(self.data['thresholds'])
parameters = self.data['parameters']
for key in parameters:
if key not in default_reset_parameters:
"The file {0} could not be found.".format(location))
except UnicodeDecodeError:
raise CurriculumError("There was an error decoding {}".format(location))
self.smoothing_value = 0
for key in ['parameters', 'measure', 'thresholds',
'min_lesson_length', 'signal_smoothing']:
if key not in self.data:
raise CurriculumError("{0} does not contain a "
"{1} field.".format(location, key))
parameters = self.data['parameters']
self.measure_type = self.data['measure']
self.max_lesson_number = len(self.data['thresholds'])
for key in parameters:
if key not in default_reset_parameters:
raise CurriculumError(
"The parameter {0} in Curriculum {1} is not present in "
"the Environment".format(key, location))
for key in parameters:
if len(parameters[key]) != self.max_lesson_number + 1:
raise CurriculumError(
"The parameter {0} in Curriculum {1} must have {2} values "
"but {3} were found".format(key, location,
self.max_lesson_number + 1, len(parameters[key])))
self.set_lesson_number(0)
'The parameter {0} in Curriculum {1} is not present in '
'the Environment'.format(key, location))
if len(parameters[key]) != self.max_lesson_num + 1:
raise CurriculumError(
'The parameter {0} in Curriculum {1} must have {2} values '
'but {3} were found'.format(key, location,
self.max_lesson_num + 1, len(parameters[key])))
def measure(self):
return self.measure_type
def lesson_num(self):
return self._lesson_num
@property
def get_lesson_number(self):
return self.lesson_number
def set_lesson_number(self, value):
@lesson_num.setter
def lesson_num(self, lesson_num):
self.lesson_number = max(0, min(value, self.max_lesson_number))
self._lesson_num = max(0, min(lesson_num, self.max_lesson_num))
def increment_lesson(self, progress):
"""

if self.data is None or progress is None:
return
if self.data["signal_smoothing"]:
if self.data['signal_smoothing']:
if self.lesson_number < self.max_lesson_number:
if ((progress > self.data['thresholds'][self.lesson_number]) and
if self.lesson_num < self.max_lesson_num:
if ((progress > self.data['thresholds'][self.lesson_num]) and
self.lesson_number += 1
self.lesson_num += 1
parameters = self.data["parameters"]
parameters = self.data['parameters']
config[key] = parameters[key][self.lesson_number]
logger.info("\nLesson changed. Now in Lesson {0} : \t{1}"
.format(self.lesson_number,
config[key] = parameters[key][self.lesson_num]
logger.info('{0} lesson changed. Now in lesson {1}: {2}'
.format(self._brain_name,
self.lesson_num,
', '.join([str(x) + ' -> ' + str(config[x]) for x in config])))
def get_config(self, lesson=None):

if self.data is None:
return {}
if lesson is None:
lesson = self.lesson_number
lesson = max(0, min(lesson, self.max_lesson_number))
lesson = self.lesson_num
lesson = max(0, min(lesson, self.max_lesson_num))
parameters = self.data["parameters"]
parameters = self.data['parameters']
for key in parameters:
config[key] = parameters[key][lesson]
return config

5
python/unitytrainers/exception.py


Any error related to training with a curriculum.
"""
pass
class MetaCurriculumError(TrainerError):
"""
Any error related to the configuration of a metacurriculum.
"""

8
python/unitytrainers/trainer.py


from unityagents import UnityException, AllBrainInfo
logger = logging.getLogger("unityagents")
logger = logging.getLogger("unitytrainers")
class UnityTrainerException(UnityException):

"""
raise UnityTrainerException("The update_model method was not implemented.")
def write_summary(self, lesson_number):
def write_summary(self, lesson_num):
:param lesson_number: The lesson the trainer is at.
:param lesson_num: The lesson the trainer is at.
"""
if (self.get_step % self.trainer_parameters['summary_freq'] == 0 and self.get_step != 0 and
self.is_training and self.get_step <= self.get_max_steps):

stat_mean = float(np.mean(self.stats[key]))
summary.value.add(tag='Info/{}'.format(key), simple_value=stat_mean)
self.stats[key] = []
summary.value.add(tag='Info/Lesson', simple_value=lesson_number)
summary.value.add(tag='Info/Lesson', simple_value=lesson_num)
self.summary_writer.add_summary(summary, self.get_step)
self.summary_writer.flush()

89
python/unitytrainers/trainer_controller.py


# Launches unitytrainers for each External Brains in a Unity Environment
import logging
import numpy as np
import re
import tensorflow as tf
import re
import numpy as np
import tensorflow as tf
from unityagents.environment import UnityEnvironment
from unityagents.exception import UnityEnvironmentException
from unitytrainers import Curriculum
from unityagents import UnityEnvironment, UnityEnvironmentException
from unitytrainers.meta_curriculum import MetaCurriculum
from unitytrainers.exception import MetaCurriculumError
def __init__(self, env_path, run_id, save_freq, curriculum_file, fast_simulation, load, train,
def __init__(self, env_path, run_id, save_freq, curriculum_folder, fast_simulation, load, train,
worker_id, keep_checkpoints, lesson, seed, docker_target_name, trainer_config_path,
no_graphics):
"""

:param curriculum_file: Curriculum json file for environment
:param curriculum_folder: Folder containing JSON curriculums for the env
:param fast_simulation: Whether to run the game at training speed
:param load: Whether to load the model or randomly initialize
:param train: Whether to train model, or only run inference

:param no_graphics: Whether to run the Unity simulator in no-graphics mode
"""
self.trainer_config_path = trainer_config_path
if env_path is not None:
env_path = (env_path.strip()
.replace('.app', '')

self.curriculum_file = curriculum_file
self.curriculum_folder = curriculum_folder
self.summaries_dir = './summaries'
else:
self.docker_training = True

if env_path is not None:
env_path = '/{docker_target_name}/{env_name}'.format(docker_target_name=docker_target_name,
env_name=env_path)
if curriculum_file is None:
self.curriculum_file = None
else:
self.curriculum_file = '/{docker_target_name}/{curriculum_file}'.format(
if curriculum_folder is not None:
self.curriculum_folder = '/{docker_target_name}/{curriculum_file}'.format(
curriculum_file=curriculum_file)
curriculum_folder=curriculum_folder)
self.logger = logging.getLogger("unityagents")
self.run_id = run_id
self.save_freq = save_freq

self.env_name = 'editor_'+self.env.academy_name
else:
self.env_name = os.path.basename(os.path.normpath(env_path)) # Extract out name of environment
self.curriculum = Curriculum(curriculum_file, self.env._resetParameters)
if curriculum_folder is None:
self.meta_curriculum = None
else:
self.meta_curriculum = MetaCurriculum(self.curriculum_folder, self.env._resetParameters)
if self.meta_curriculum is not None and self.curriculum_folder is not None:
for brain_name in self.meta_curriculum.brains_to_curriculums.keys():
if brain_name not in self.env.external_brain_names:
raise MetaCurriculumError('One of the curriculums '
'defined in ' +
self.curriculum_folder + ' '
'does not have a corresponding '
'Brain. Check that the '
'curriculum file has the same '
'name as the Brain '
'whose curriculum it defines.')
def _get_progress(self):
if self.curriculum_file is not None:
progress = 0
if self.curriculum.measure_type == "progress":
for brain_name in self.env.external_brain_names:
progress += self.trainers[brain_name].get_step / self.trainers[brain_name].get_max_steps
return progress / len(self.env.external_brain_names)
elif self.curriculum.measure_type == "reward":
for brain_name in self.env.external_brain_names:
progress += self.trainers[brain_name].get_last_reward
return progress
else:
return None
def _get_progresses(self):
if self.meta_curriculum is not None:
brain_names_to_progresses = {}
for brain_name, curriculum in self.meta_curriculum.brains_to_curriculums.items():
if curriculum.measure == "progress":
progress = self.trainers[brain_name].get_step / self.trainers[brain_name].get_max_steps
brain_names_to_progresses[brain_name] = progress
elif curriculum.measure == "reward":
progress = self.trainers[brain_name].get_last_reward
brain_names_to_progresses[brain_name] = progress
return brain_names_to_progresses
else:
return None

def _initialize_trainers(self, trainer_config, sess):
trainer_parameters_dict = {}
# TODO: This probably doesn't need to be reinitialized.
self.trainers = {}
for brain_name in self.env.external_brain_names:
trainer_parameters = trainer_config['default'].copy()

.format(model_path))
def start_learning(self):
self.curriculum.set_lesson_number(self.lesson)
# TODO: Should be able to start learning at different lesson numbers for each curriculum.
self.meta_curriculum.set_all_curriculums_to_lesson_num(self.lesson)
trainer_config = self._load_config()
self._create_model_path(self.model_path)

self._initialize_trainers(trainer_config, sess)
for k, t in self.trainers.items():
for _, t in self.trainers.items():
self.logger.info(t)
init = tf.global_variables_initializer()
saver = tf.train.Saver(max_to_keep=self.keep_checkpoints)

else:
sess.run(init)
global_step = 0 # This is only for saving the model
self.curriculum.increment_lesson(self._get_progress())
curr_info = self.env.reset(config=self.curriculum.get_config(), train_mode=self.fast_simulation)
self.meta_curriculum.increment_lessons(self._get_progresses())
curr_info = self.env.reset(config=self.meta_curriculum.get_config(), train_mode=self.fast_simulation)
if self.train_model:
for brain_name, trainer in self.trainers.items():
trainer.write_tensorboard_text('Hyperparameters', trainer.parameters)

self.curriculum.increment_lesson(self._get_progress())
curr_info = self.env.reset(config=self.curriculum.get_config(), train_mode=self.fast_simulation)
self.meta_curriculum.increment_lessons(self._get_progresses())
curr_info = self.env.reset(config=self.meta_curriculum.get_config(), train_mode=self.fast_simulation)
for brain_name, trainer in self.trainers.items():
trainer.end_episode()
# Decide and take an action

# Perform gradient descent with experience buffer
trainer.update_model()
# Write training statistics to Tensorboard.
trainer.write_summary(self.curriculum.lesson_number)
trainer.write_summary(self.meta_curriculum.brains_to_curriculums[brain_name].lesson_num)
if self.train_model and trainer.get_step <= trainer.get_max_steps:
trainer.increment_step_and_update_last_reward()
if self.train_model:

3
python/curricula/wall-jump/BigWallBrain.json


"parameters" :
{
"big_wall_min_height" : [0.0, 4.0, 6.0, 8.0],
"big_wall_max_height" : [4.0, 7.0, 8.0, 8.0],
"small_wall_height" : [1.5, 2.0, 2.5, 4.0]
"big_wall_max_height" : [4.0, 7.0, 8.0, 8.0]
}
}

93
python/tests/test_curriculum.py


import pytest
import json
from unittest.mock import patch, mock_open
from unitytrainers.exception import CurriculumError
from unitytrainers import Curriculum
dummy_curriculum_json_str = '''
{
"measure" : "reward",
"thresholds" : [10, 20, 50],
"min_lesson_length" : 3,
"signal_smoothing" : true,
"parameters" :
{
"param1" : [0.7, 0.5, 0.3, 0.1],
"param2" : [100, 50, 20, 15],
"param3" : [0.2, 0.3, 0.7, 0.9]
}
}
'''
bad_curriculum_json_str = '''
{
"measure" : "reward",
"thresholds" : [10, 20, 50],
"min_lesson_length" : 3,
"signal_smoothing" : false,
"parameters" :
{
"param1" : [0.7, 0.5, 0.3, 0.1],
"param2" : [100, 50, 20],
"param3" : [0.2, 0.3, 0.7, 0.9]
}
}
'''
@pytest.fixture
def location():
return 'TestBrain.json'
@pytest.fixture
def default_reset_parameters():
return {"param1": 1, "param2": 1, "param3": 1}
@patch('builtins.open', new_callable=mock_open, read_data=dummy_curriculum_json_str)
def test_init_curriculum_happy_path(mock_file, location, default_reset_parameters):
curriculum = Curriculum(location, default_reset_parameters)
assert curriculum._brain_name == 'TestBrain'
assert curriculum.lesson_num == 0
assert curriculum.measure == 'reward'
@patch('builtins.open', new_callable=mock_open, read_data=bad_curriculum_json_str)
def test_init_curriculum_bad_curriculum_raises_error(mock_file, location, default_reset_parameters):
with pytest.raises(CurriculumError):
Curriculum(location, default_reset_parameters)
@patch('builtins.open', new_callable=mock_open, read_data=dummy_curriculum_json_str)
def test_increment_lesson(mock_file, location, default_reset_parameters):
curriculum = Curriculum(location, default_reset_parameters)
assert curriculum.lesson_num == 0
curriculum.lesson_num = 1
assert curriculum.lesson_num == 1
curriculum.increment_lesson(10)
assert curriculum.lesson_num == 1
curriculum.increment_lesson(30)
curriculum.increment_lesson(30)
assert curriculum.lesson_num == 1
assert curriculum.lesson_length == 3
curriculum.increment_lesson(30)
assert curriculum.lesson_length == 0
assert curriculum.lesson_num == 2
@patch('builtins.open', new_callable=mock_open, read_data=dummy_curriculum_json_str)
def test_get_config(mock_file):
curriculum = Curriculum('TestBrain.json', {"param1": 1, "param2": 1, "param3": 1})
assert curriculum.get_config() == {"param1": 0.7, "param2": 100, "param3": 0.2}
curriculum.lesson_num = 2
assert curriculum.get_config() == {'param1': 0.3, 'param2': 20, 'param3': 0.7}
assert curriculum.get_config(0) == {"param1": 0.7, "param2": 100, "param3": 0.2}

109
python/tests/test_meta_curriculum.py


import pytest
from unittest.mock import patch, call, Mock
from unitytrainers.meta_curriculum import MetaCurriculum
from unitytrainers.exception import MetaCurriculumError
class MetaCurriculumTest(MetaCurriculum):
"""This class allows us to test MetaCurriculum objects without calling
MetaCurriculum's __init__ function.
"""
def __init__(self, brains_to_curriculums):
self._brains_to_curriculums = brains_to_curriculums
@pytest.fixture
def default_reset_parameters():
return {'param1' : 1, 'param2' : 2, 'param3' : 3}
@pytest.fixture
def more_reset_parameters():
return {'param4' : 4, 'param5' : 5, 'param6' : 6}
@pytest.fixture
def progresses():
return {'Brain1' : 0.2, 'Brain2' : 0.3}
@patch('unitytrainers.Curriculum.get_config', return_value={})
@patch('unitytrainers.Curriculum.__init__', return_value=None)
@patch('os.listdir', return_value=['Brain1.json', 'Brain2.json'])
def test_init_meta_curriculum_happy_path(listdir, mock_curriculum_init,
mock_curriculum_get_config,
default_reset_parameters):
meta_curriculum = MetaCurriculum('test/', default_reset_parameters)
assert len(meta_curriculum.brains_to_curriculums) == 2
assert 'Brain1' in meta_curriculum.brains_to_curriculums
assert 'Brain2' in meta_curriculum.brains_to_curriculums
calls = [call('test/Brain1.json', default_reset_parameters),
call('test/Brain2.json', default_reset_parameters)]
mock_curriculum_init.assert_has_calls(calls)
@patch('os.listdir', side_effect=NotADirectoryError())
def test_init_meta_curriculum_bad_curriculum_folder_raises_error(listdir):
with pytest.raises(MetaCurriculumError):
MetaCurriculum('test/', default_reset_parameters)
@patch('unitytrainers.Curriculum')
@patch('unitytrainers.Curriculum')
def test_set_lesson_nums(curriculum_a, curriculum_b):
meta_curriculum = MetaCurriculumTest({'Brain1' : curriculum_a,
'Brain2' : curriculum_b})
meta_curriculum.lesson_nums = {'Brain1' : 1, 'Brain2' : 3}
assert curriculum_a.lesson_num == 1
assert curriculum_b.lesson_num == 3
@patch('unitytrainers.Curriculum')
@patch('unitytrainers.Curriculum')
def test_increment_lessons(curriculum_a, curriculum_b, progresses):
meta_curriculum = MetaCurriculumTest({'Brain1' : curriculum_a,
'Brain2' : curriculum_b})
meta_curriculum.increment_lessons(progresses)
curriculum_a.increment_lesson.assert_called_with(0.2)
curriculum_b.increment_lesson.assert_called_with(0.3)
@patch('unitytrainers.Curriculum')
@patch('unitytrainers.Curriculum')
def test_set_all_curriculums_to_lesson_num(curriculum_a, curriculum_b):
meta_curriculum = MetaCurriculumTest({'Brain1' : curriculum_a,
'Brain2' : curriculum_b})
meta_curriculum.set_all_curriculums_to_lesson_num(2)
assert curriculum_a.lesson_num == 2
assert curriculum_b.lesson_num == 2
@patch('unitytrainers.Curriculum')
@patch('unitytrainers.Curriculum')
def test_get_config(curriculum_a, curriculum_b, default_reset_parameters,
more_reset_parameters):
curriculum_a.get_config.return_value = default_reset_parameters
curriculum_b.get_config.return_value = default_reset_parameters
meta_curriculum = MetaCurriculumTest({'Brain1' : curriculum_a,
'Brain2' : curriculum_b})
assert meta_curriculum.get_config() == default_reset_parameters
curriculum_b.get_config.return_value = more_reset_parameters
new_reset_parameters = dict(default_reset_parameters)
new_reset_parameters.update(more_reset_parameters)
assert meta_curriculum.get_config() == new_reset_parameters

105
python/unitytrainers/meta_curriculum.py


"""Contains the MetaCurriculum class."""
import os
from unitytrainers.curriculum import Curriculum
from unitytrainers.exception import MetaCurriculumError
import logging
logger = logging.getLogger('unitytrainers')
class MetaCurriculum(object):
"""A MetaCurriculum holds curriculums. Each curriculum is associated to a particular
brain in the environment.
"""
def __init__(self, curriculum_folder, default_reset_parameters):
"""Initializes a MetaCurriculum object.
Args:
curriculum_folder (str): The relative or absolute path of the
folder which holds the curriculums for this environment.
The folder should contain JSON files whose names are the
brains that the curriculums belong to.
default_reset_parameters (dict): The default reset parameters
of the environment.
"""
used_reset_parameters = set()
self._brains_to_curriculums = {}
try:
for curriculum_filename in os.listdir(curriculum_folder):
brain_name = curriculum_filename.split('.')[0]
curriculum_filepath = \
os.path.join(curriculum_folder, curriculum_filename)
curriculum = Curriculum(curriculum_filepath, default_reset_parameters)
# Check if any two curriculums use the same reset params.
if any([(parameter in curriculum.get_config().keys()) for parameter in used_reset_parameters]):
logger.warning('Two or more curriculums will '
'attempt to change the same reset '
'parameter. The result will be '
'non-deterministic.')
used_reset_parameters.update(curriculum.get_config().keys())
self._brains_to_curriculums[brain_name] = curriculum
except NotADirectoryError:
raise MetaCurriculumError(curriculum_folder + ' is not a '
'directory. Refer to the ML-Agents '
'curriculum learning docs.')
@property
def brains_to_curriculums(self):
"""A dict from brain_name to the brain's curriculum."""
return self._brains_to_curriculums
@property
def lesson_nums(self):
"""A dict from brain name to the brain's curriculum's lesson number."""
lesson_nums = {}
for brain_name, curriculum in self.brains_to_curriculums.items():
lesson_nums[brain_name] = curriculum.lesson_num
return lesson_nums
@lesson_nums.setter
def lesson_nums(self, lesson_nums):
for brain_name, lesson in lesson_nums.items():
self.brains_to_curriculums[brain_name].lesson_num = lesson
def increment_lessons(self, progresses):
"""Increments all the lessons of all the curriculums in this MetaCurriculum.
Args:
progresses (dict): A dict of brain name to progress.
"""
for brain_name, progress in progresses.items():
self.brains_to_curriculums[brain_name].increment_lesson(progress)
def set_all_curriculums_to_lesson_num(self, lesson_num):
"""Sets all the curriculums in this meta curriculum to a specified lesson number.
Args:
lesson_num (int): The lesson number which all the curriculums will
be set to.
"""
for _, curriculum in self.brains_to_curriculums.items():
curriculum.lesson_num = lesson_num
def get_config(self):
"""Get the combined configuration of all curriculums in this MetaCurriculum.
Returns:
A dict from parameter to value.
"""
config = {}
for _, curriculum in self.brains_to_curriculums.items():
curr_config = curriculum.get_config()
config.update(curr_config)
return config

12
python/curricula/push-block/PushBlockBrain.json


{
"measure" : "reward",
"thresholds" : [0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75],
"min_lesson_length" : 2,
"signal_smoothing" : true,
"parameters" :
{
"goal_size" : [3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
"block_size": [1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
"x_variation":[1.5, 1.55, 1.6, 1.65, 1.7, 1.75, 1.8, 1.85, 1.9, 1.95, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5]
}
}

10
python/curricula/wall-jump/SmallWallBrain.json


{
"measure" : "progress",
"thresholds" : [0.1, 0.3, 0.5],
"min_lesson_length" : 2,
"signal_smoothing" : true,
"parameters" :
{
"small_wall_height" : [1.5, 2.0, 2.5, 4.0]
}
}

12
python/curricula/push.json


{
"measure" : "reward",
"thresholds" : [0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75],
"min_lesson_length" : 2,
"signal_smoothing" : true,
"parameters" :
{
"goal_size" : [3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
"block_size": [1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
"x_variation":[1.5, 1.55, 1.6, 1.65, 1.7, 1.75, 1.8, 1.85, 1.9, 1.95, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5]
}
}

/python/curricula/test.json → /python/curricula/test/TestBrain.json

/python/curricula/wall.json → /python/curricula/wall-jump/BigWallBrain.json

正在加载...
取消
保存