ml-agents

作者	SHA1	备注	提交日期
GitHub	8317a659	Behavioral Cloning & Trainers Reorg (#328 ) * Implement behavioral cloning for cc/dc, fc/rnn, state/observations. * Re-organize folder structure in anticipation of unitytrainers as a package. * Create demo environment BananaImitation to validate behavioral cloning. * Fixes #336	7 年前
GitHub	e11dae1d	Python Testing & Image Inference Improvements (#353 ) * Reorganized python tests into separate folder, and make individiual test files for different (sub) modules. * Add tests for trainer_controller, PPO, and behavioral cloning. More to come soon. * Minor bug fixes discovered while writing tests. * Reworked GirdWorld to reset much faster. * Cleaned ObservationToTex and reworked GetObservationMatrixList to be 3x faster.	7 年前
eshvk	030ac5c5	[cleanup] Add a new type hint to call a dictionary of BrainInfo objects as an AllBrainInfo. Propagate this hint to all methods. Some pep8 cleanups.	7 年前
GitHub	f134016b	On Demand Decision (#308 ) * On Demand Decision : Use RequestDecision and RequestAction * New Agent Inspector : Use it to set On Demand Decision * New BrainParameters interface * LSTM memory size is now set in python * New C# API * Semantic Changes * Replaced RunMDP * New Bouncer Environment to test On Demand Dscision	7 年前
GitHub	69481d2d	Imitation Learning Helper (#371 ) * Add helper class to for Imitation Learning teacher. Allows for clearing buffer "C" and toggling adding info to the buffer "R".	7 年前
GitHub	dcf58f75	Feature/previous text action (#375 ) * [Previous Text Actions] Renamed previous_action to previous_vector_action added previous_text_action to the BrainInfo * [Semantics] Carried the modifications to the semantics of previous_vector_action to the trainers	7 年前
GitHub	e0d5b1b0	Fix for when not using teacher helper (#379 ) * Fix for when not using teacher helper * Rename expert to teacher throughout	7 年前
GitHub	a7c9096f	[Semantics] Modified the placeholder names (#381 )	7 年前
GitHub	6dd3c284	Hotfix 0.3.0b (#519 ) * Fixes internal brain for Banana Imitation. * Fixes Discrete Control training for Imitation Learning. * Fixes Visual Observations in internal brain with non-square inputs.	7 年前
GitHub	237b41f9	Hotfix 0.3.0c (#618 ) Fixes the following issues: * Missing component reference in BananaRL environment. * Neural Network for multiple visual observations was not properly generated. * Episode time-out value estimate bootstrapping used incorrect observation as input.	7 年前
GitHub	1a449e98	Hotfix 0.3.1b (#637 ) * [Fix] Use the stored agent info instead of the previous agent info when bootstraping the value * [Bug Fix] Addressed #643 * [Added Line Break]	7 年前
GitHub	755be43e	[Cold Fix] Making the episode length and mean reward more accurate for the first episode (#657 )	7 年前
Arthur Juliani	9477eaa9	Develop fix cumulative reward (#725 ) * [Cold Fix] Split the way cummulative rewards and episode length are counted The reward is appended at each step to the cummulative reward The episode count is ONLY incremented when d_t+1 is false	7 年前
GitHub	702d98c6	[Fix] The summary writer is now implemented in the abtract trainer class. (#806 ) Summary writer now displays {}: Step: {}. No episode was completed since last summary. when there was no completed episodes	7 年前
Arthur Juliani	d7338050	Enable concurrent sessions	6 年前
eshvk	680b0767	[Imitation Learning] Minor fix to make sure that step increment loads from the last saved global step if the model is being trained after loading	6 年前
Arthur Juliani	5d402be9	Minor Optimizations (#836 )	6 年前
GitHub	0f65e272	[Addresses #842 ] (#849 ) In the case the agent is done imediately after spawning, its stats are empty because the stats need at least 2 successive experieces to create the stats. By specifying the default value of 0, the error does no longer appear	6 年前
Arthur Juliani	5e48766d	Remove discrete observations	6 年前
Arthur Juliani	195ac934	Merge branch 'develop' into develop-runs # Conflicts: # python/learn.py # python/unitytrainers/trainer.py	6 年前
vincentpierre	e47cec56	[Initial Commit]	6 年前
unityjeffrey	0d67f311	changed ml agents to ml-agents	6 年前
unityjeffrey	19fb437a	changed to Unity ML-Agents Toolkit (english)	6 年前
Arthur Juliani	6b359062	Fix for visual-only imitation learning	6 年前
GitHub	e50ac7ae	Merge branch 'develop' into hotfix-0	6 年前
Arthur Juliani	1eb701af	Merge remote-tracking branch 'origin/develop' into develop-value-estimates-ppo	6 年前
Arthur Juliani	f52d5a92	Merge remote-tracking branch 'origin/develop' into develop-runs	6 年前
GitHub	ef3025e6	Merge pull request #1004 from Unity-Technologies/develop-runs Enable multiple runs in learn.py	6 年前
Arthur Juliani	3659bbcd	Develop multi discrete (#1022 ) Replace discrete control with multi-discrete control.	6 年前
Deric Pang	634280a6	Fixed imports, all tests are passing.	6 年前
GitHub	ded0d8c7	Develop action masking (#1080 ) * [Initial Commit] Modified the model.py file and the ppo/trainer.py file to use masked actions * Preliminary modifications to the python side of the code to enable action masking * Preliminary modifications to the C# side of the code to enable action masking * Preliminary modifications to the communication side of the code to enable action masking * Implemented action masking for BC Note : The actions of the teacher are not masked * More error messages for the action masking * fix pytests * Added Documentation * Address comment * Addressed Comments on docs * Addressed second comment on docs * Addressed comments for the python side of the code * Created the action masker and associated unit tests * Addressed comments on the C# side * Addressed the comment regarding action_masking_name * Addressed the comments	6 年前
Deric Pang	cdb41480	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	4a881354	fix the training doc (#1193 )	6 年前
GitHub	d2c320dd	Remove graph scope (#1205 ) * initial commit : Only works with PPO balance ball * Fix for recurrent * [Fix indentation error] * Fixed BC * Remove Dead code * Addressing comment : Removing dead code * Fixing the Pytest * edited comments * Removing GraphScope from the InternalBrain (#1227) * Documentation changes for removing graph scope (#1226) * Documentation changes * removed the keep checkpoint printing	6 年前
GitHub	3c9603d6	Demonstration Recorder (#1240 )	6 年前
GitHub	840417ff	Use organized tags for tensorboard stats (#1248 )	6 年前
GitHub	c258b1c3	Move 'take_action' into Policy class (#1669 ) * Move 'take_action' into Policy class This refactor is part of Actor-Trainer separation. Since policies will be distributed across actors in separate processes which share a single trainer, taking an action should be the responsibility of the policy. This change makes a few smaller changes: * Combines `take_action` logic between trainers, making it more generic * Adds an `ActionInfo` data class to be more explicit about the data returned by the policy, only used by TrainerController and policy for now. * Moves trainer stats logic out of `take_action` and into `add_experiences` * Renames 'take_action' to 'get_action'	6 年前
eshvk	cc9bdf17	Added logging per Brain of time to update policy, time elapsed during training, time to collect experiences, buffer length, average return	6 年前
eshvk	fb04c40c	Reorganize to make metrics collection more accurate	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
GitHub	a4d5b2d3	Doc/comment cleanup - Fix some occurrences of 'the the' (#2119 )	5 年前
GitHub	2671e1a0	Enable mypy in precommit checks (#2177 ) * WIP precommit on top level * update CI * circleci fixes * intentionally fail black * use --show-diff-on-failure in CI * fix command order * rebreak a file * apply black * WIP enable mypy * run mypy on each package * fix trainer_metrics mypy errors * more mypy errors * more mypy * Fix some partially typed functions * types for take_action_outputs * fix formatting * cleanup * generate stubs for proto objects * fix ml-agents-env mypy errors * disallow-incomplete-defs for gym-unity * Add CI notes to CONTRIBUTING.md	5 年前
Jonathan Harper	177ee5b8	Remove unused "last reward" logic, TF nodes At each step, an unused `last_reward` variable in the TF graph is updated in our PPO trainer. There are also related unused methods in various places in the codebase. This change removes them.	5 年前
GitHub	9eb3f049	Cleanup unused code in TrainerController (#2315 ) * Removes unused SubprocessEnvManager import in trainer_controller * Removes unused `steps` argument to `TrainerController._save_model` * Consolidates unnecessary branching for curricula in `TrainerController.advance` * Moves `reward_buffer` into `TFPolicy` from `PPOPolicy` and adds `BCTrainer` support so that we don't have a broken interface / undefined behavior when BCTrainer is used with curricula.	5 年前
GitHub	b498c19d	Fix BCTrainer increment_steps (#2384 )	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
Ervin Teng	c912d140	Make sure all tests pass on BC	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
Ervin Teng	e826f4bb	Bugfix for LSTM+BC (#2679 ) * Fix LSTM+BC in discrete case * Add test for Barracuda export * Fix LSTM training for BC	5 年前
GitHub	4da157fe	more pylint fixes (#2842 )	5 年前
Andrew Cohen	13fe9cf8	Bubbled up indexing of AllBrainInfo to trainer controller from trainers	5 年前
Ervin Teng	df5ee7bf	Split buffer into two buffers (PPO works)	5 年前
GitHub	a2194ea7	Fix batch size issue with BC (#2965 )	5 年前
Ervin Teng	73000a6b	Merge branch 'develop' into develop-splitbuffer	5 年前
GitHub	213cd68d	Split Buffer into processing and update buffers (#2964 ) This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps: - Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR) - Move the processing buffer and add/process experiences into a separate, outside class - Change the data type of the update buffer to be a Trajectory - Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
Ervin Teng	3697e616	Convert BC (warning) might be broken	5 年前
Ervin Teng	38ff674e	Fix BC and tests	5 年前
Ervin Teng	324d217b	Move agent_id to Trajectory	5 年前
Ervin Teng	fdf9aea7	Make conversion methods part of NamedTuples	5 年前
Ervin Teng	6242b67d	Add way to check if trajectory is done or max_reached	5 年前

1 2

63 次代码提交 (fa638000-4c18-476a-a3a9-cb9e33ca5072)