浏览代码

[bug-fix] Empty ignored trajectory queues, make sure queues don't overflow (#3451)

/release-0.14.1
Anupam Bhatnagar 5 年前
当前提交
c70d0243
共有 3 个文件被更改,包括 38 次插入4 次删除
  1. 17
      com.unity.ml-agents/CHANGELOG.md
  2. 23
      ml-agents/mlagents/trainers/ghost/trainer.py
  3. 2
      ml-agents/mlagents/trainers/tests/test_ghost.py

17
com.unity.ml-agents/CHANGELOG.md


The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Major Changes
- Agent.CollectObservations now takes a VectorSensor argument. It was also overloaded to optionally take an ActionMasker argument. (#3352, #3389)
### Minor Changes
- Monitor.cs was moved to Examples. (#3372)
- Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property. (#3376)
- The GetEpisodeCount, GetStepCount, GetTotalStepCount and methods of Academy were changed to EpisodeCount, StepCount, TotalStepCount properties respectively. (#3376)
- Several classes were changed from public to internal visibility. (#3390)
- Academy.RegisterSideChannel and UnregisterSideChannel methods were added. (#3391)
- A tutorial on adding custom SideChannels was added (#3391)
- Update Barracuda to 0.6.0-preview
### Bugfixes
- Fixed an issue which caused self-play training sessions to consume a lot of memory. (#3451)
## [0.14.0-preview] - 2020-02-13
### Major Changes

23
ml-agents/mlagents/trainers/ghost/trainer.py


self.internal_policy_queues: List[AgentManagerQueue[Policy]] = []
self.internal_trajectory_queues: List[AgentManagerQueue[Trajectory]] = []
self.ignored_trajectory_queues: List[AgentManagerQueue[Trajectory]] = []
self.learning_policy_queues: Dict[str, AgentManagerQueue[Policy]] = {}
# assign ghost's stats collection to wrapped trainer's

self.trajectory_queues, self.internal_trajectory_queues
):
try:
t = traj_queue.get_nowait()
# adds to wrapped trainers queue
internal_traj_queue.put(t)
self._process_trajectory(t)
# We grab at most the maximum length of the queue.
# This ensures that even if the queue is being filled faster than it is
# being emptied, the trajectories in the queue are on-policy.
for _ in range(traj_queue.maxlen):
t = traj_queue.get_nowait()
# adds to wrapped trainers queue
internal_traj_queue.put(t)
self._process_trajectory(t)
except AgentManagerQueue.Empty:
pass

if self.get_step - self.last_swap > self.steps_between_swap:
self._swap_snapshots()
self.last_swap = self.get_step
# Dump trajectories from non-learning policy
for traj_queue in self.ignored_trajectory_queues:
try:
for _ in range(traj_queue.maxlen):
traj_queue.get_nowait()
except AgentManagerQueue.Empty:
pass
def end_episode(self):
self.trainer.end_episode()

self.internal_trajectory_queues.append(internal_trajectory_queue)
self.trainer.subscribe_trajectory_queue(internal_trajectory_queue)
else:
self.ignored_trajectory_queues.append(trajectory_queue)
# Taken from https://github.com/Unity-Technologies/ml-agents/pull/1975 and

2
ml-agents/mlagents/trainers/tests/test_ghost.py


# Check that ghost trainer ignored off policy queue
assert trainer.trainer.update_buffer.num_experiences == 15
# Check that it emptied the queue
assert trajectory_queue1.empty()
def test_publish_queue(dummy_config):

正在加载...
取消
保存