ml-agents

作者	SHA1	备注	提交日期
Arthur Juliani	de700c3a	Multi Brain Training and Recurrent state encoder (#166 ) * `learn.py` is now main script for training brains. * Simultaneous multi-brain training is now possible. * `ghost-trainer` allows for proper training in adversarial scenarios. * `imitation-trainer` provides a basic implementation of real-time behavioral cloning. * All trainer hyperparameters now exist in `.yaml` files. * `PPO.ipynb` removed. * LSTM model added. * More dynamic buffer class to handle greater variety of scenarios.	7 年前
GitHub	51621334	State Stacking & Banan Environment (#262 ) * Add support for stacking past n states to allow network to learn temporal dependencies. * Add Banana Collector environment for demonstrating partially observable multi-agent environments. * Add 3DBall Hard which lacks velocity information in state representation. Used as test for LSTM and state-stacking features. * Rework Tennis environment to be continuous control and trainable in 100k steps.	7 年前
Arthur Juliani	4418421a	Rename variables in imitation trainer	7 年前
GitHub	8317a659	Behavioral Cloning & Trainers Reorg (#328 ) * Implement behavioral cloning for cc/dc, fc/rnn, state/observations. * Re-organize folder structure in anticipation of unitytrainers as a package. * Create demo environment BananaImitation to validate behavioral cloning. * Fixes #336	7 年前
eshvk	030ac5c5	[cleanup] Add a new type hint to call a dictionary of BrainInfo objects as an AllBrainInfo. Propagate this hint to all methods. Some pep8 cleanups.	7 年前
GitHub	237b41f9	Hotfix 0.3.0c (#618 ) Fixes the following issues: * Missing component reference in BananaRL environment. * Neural Network for multiple visual observations was not properly generated. * Episode time-out value estimate bootstrapping used incorrect observation as input.	7 年前
GitHub	702d98c6	[Fix] The summary writer is now implemented in the abtract trainer class. (#806 ) Summary writer now displays {}: Step: {}. No episode was completed since last summary. when there was no completed episodes	7 年前
Arthur Juliani	d7338050	Enable concurrent sessions	7 年前
Arthur Juliani	5d402be9	Minor Optimizations (#836 )	7 年前
Arthur Juliani	195ac934	Merge branch 'develop' into develop-runs # Conflicts: # python/learn.py # python/unitytrainers/trainer.py	7 年前
Arthur Juliani	fad0da30	Log run-id in console	7 年前
unityjeffrey	0d67f311	changed ml agents to ml-agents	7 年前
unityjeffrey	19fb437a	changed to Unity ML-Agents Toolkit (english)	7 年前
Arthur Juliani	f52d5a92	Merge remote-tracking branch 'origin/develop' into develop-runs	6 年前
Deric Pang	c88c7e42	Fixing bugs, updating tests. - Added more unit tests for school module. - Fixed bugs found during testing with PushBlock env.	6 年前
Deric Pang	db6fa4ba	Removing commented line.	6 年前
Deric Pang	ff4ce695	Updated logging in trainer. - The logger in trainer.py is now unitytrainers. This makes it easier to differentiate it from unityagents logs.	6 年前
Deric Pang	9d9c91e4	Fixed TensorBoard lesson logging.	6 年前
Deric Pang	822d329a	Fixing bug when no curriculum folder is passed. - The old Curriculum object would accept None as a location for the curriculum. If the location was None, it would return default values as its config and lesson number. - The new MetaCurriculum does not accept None as a location for the curriculum folder. This was done to remove unnecessary edge case functionality from curriculums. - None checks have been added into trainer_controller. In the future, it should be possible to better refactor trainer_controller so that these None checks can be removed. This is preferable to hard-coding default behavior into MetaCurriculum objects when a metacurriculum would not even be in place.	6 年前
Arthur Juliani	9e8049f0	Will now print summaries even when not training or when training is over (#1020 ) * [Initial Commit] * [Addressed comments] * [Now using global step to write the summaries]	6 年前
Deric Pang	634280a6	Fixed imports, all tests are passing.	6 年前
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	29084e77	Curriculum learning reward thresholding bug fix (#1141 )	6 年前
GitHub	d2c320dd	Remove graph scope (#1205 ) * initial commit : Only works with PPO balance ball * Fix for recurrent * [Fix indentation error] * Fixed BC * Remove Dead code * Addressing comment : Removing dead code * Fixing the Pytest * edited comments * Removing GraphScope from the InternalBrain (#1227) * Documentation changes for removing graph scope (#1226) * Documentation changes * removed the keep checkpoint printing	6 年前
GitHub	3c9603d6	Demonstration Recorder (#1240 )	6 年前
GitHub	840417ff	Use organized tags for tensorboard stats (#1248 )	6 年前
GitHub	78374601	vince's fix for model step (#1329 )	6 年前
GitHub	c258b1c3	Move 'take_action' into Policy class (#1669 ) * Move 'take_action' into Policy class This refactor is part of Actor-Trainer separation. Since policies will be distributed across actors in separate processes which share a single trainer, taking an action should be the responsibility of the policy. This change makes a few smaller changes: * Combines `take_action` logic between trainers, making it more generic * Adds an `ActionInfo` data class to be more explicit about the data returned by the policy, only used by TrainerController and policy for now. * Moves trainer stats logic out of `take_action` and into `add_experiences` * Renames 'take_action' to 'get_action'	6 年前
Ervin T	b30f4c90	Split `mlagents` into two packages (#1812 ) * Reogranize project * Fix all tests * Address comments * Delete init file * Update requirements * Tick version * Add timeout wait parameter (mlagents_envs) (#1699) * Add timeout wait param * Remove unnecessary function * Add new meta files for communicator objects * Fix all tests * update circleci * Reorganize mlagents_envs tests * WIP: test removing circleci cache * Move gym tests * Namespaced packages * Update installation instructions for separate packages * Remove unused package from setup script * Add Readme for ml-agents-envs * Clarify docs and re-comment compiler in make.bat * Add more doc to installation * Add back fix for Hololens * Recompile Protobufs * Change mlagents_envs to mlagents.envs in trainer_controller * Remove extraneous files, fix win bat script * Support Python 3.7 for envs package	6 年前
eshvk	cc9bdf17	Added logging per Brain of time to update policy, time elapsed during training, time to collect experiences, buffer length, average return	6 年前
eshvk	fb04c40c	Reorganize to make metrics collection more accurate	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
GitHub	2671e1a0	Enable mypy in precommit checks (#2177 ) * WIP precommit on top level * update CI * circleci fixes * intentionally fail black * use --show-diff-on-failure in CI * fix command order * rebreak a file * apply black * WIP enable mypy * run mypy on each package * fix trainer_metrics mypy errors * more mypy errors * more mypy * Fix some partially typed functions * types for take_action_outputs * fix formatting * cleanup * generate stubs for proto objects * fix ml-agents-env mypy errors * disallow-incomplete-defs for gym-unity * Add CI notes to CONTRIBUTING.md	6 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	6 年前
Jonathan Harper	177ee5b8	Remove unused "last reward" logic, TF nodes At each step, an unused `last_reward` variable in the TF graph is updated in our PPO trainer. There are also related unused methods in various places in the codebase. This change removes them.	6 年前
GitHub	b05c9ac1	Add environment manager for parallel environments (#2209 ) Previously in v0.8 we added parallel environments via the SubprocessUnityEnvironment, which exposed the same abstraction as UnityEnvironment while actually wrapping many parallel environments via subprocesses. Wrapping many environments with the same interface as a single environment had some downsides, however: * Ordering needed to be preserved for agents across different envs, complicating the SubprocessEnvironment logic * Asynchronous environments with steps taken out of sync with the trainer aren't viable with the Environment abstraction This PR introduces a new EnvManager abstraction which exposes a reduced subset of the UnityEnvironment abstraction and a SubprocessEnvManager implementation which replaces the SubprocessUnityEnvironment.	5 年前
Chris Elion	bb7773c1	add flake8 to precommit	5 年前
Chris Elion	5d07ca1f	Merge remote-tracking branch 'origin/develop' into enable-flake8	5 年前
GitHub	19283bfa	Very simple environment for testing (#2266 ) * WIP doesn't crash * return stats and assert convergence * pass lint checks * rename * fix-reset-params * add time penalty * _get_measure_vals always returns something * fix tests * unused import * single env, fix double step * move LocalEnvManager to ml-agents-envs * move and rename EnvManager * remove obsolete docstring and method * clean up	5 年前
GitHub	9eb3f049	Cleanup unused code in TrainerController (#2315 ) * Removes unused SubprocessEnvManager import in trainer_controller * Removes unused `steps` argument to `TrainerController._save_model` * Consolidates unnecessary branching for curricula in `TrainerController.advance` * Moves `reward_buffer` into `TFPolicy` from `PPOPolicy` and adds `BCTrainer` support so that we don't have a broken interface / undefined behavior when BCTrainer is used with curricula.	5 年前
GitHub	83875376	Add "gauges" to timer system (#2329 ) * WIP still needs tests and merging from multiprocess * cleanup gauges * add TODO for subprocesses	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
GitHub	f628d18b	initialize trainer step count (#2498 ) (#2505 ) * initialize trainer step count * remove step init from RLTrainer	5 年前
GitHub	7720db33	Fix run_id typing in trainer.py (#2537 )	5 年前
GitHub	3683cc1c	Enable learning rate decay to be disabled (#2567 )	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
Chris Elion	43e23941	rough pass at tf2 support, needs cleanup	5 年前
Chris Elion	806c77e4	centralize tensorflow imports	5 年前
GitHub	4da157fe	more pylint fixes (#2842 )	5 年前
Chris Elion	fca51de8	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
Chris Elion	73a346cb	cleanup	5 年前
Ervin Teng	748c250e	Somewhat running	5 年前
Andrew Cohen	13fe9cf8	Bubbled up indexing of AllBrainInfo to trainer controller from trainers	5 年前
Andrew Cohen	e96b80db	recieves brain_name and identifier on python side	5 年前
Andrew Cohen	8578b0b7	add_policy and create_policy separated	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
GitHub	42bea858	Improve mypy coverage by adding --namespace-packages (#3049 )	5 年前
Andrew Cohen	614d276f	recieves brain_name and identifier on python side	5 年前
Chris Elion	fdc810ff	move (first pass)	5 年前
GitHub	58b6c7c2	Rename mlagents.envs to mlagents_envs (#3083 )	5 年前
Andrew Cohen	d1edbf43	add_policy and create_policy separated	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
GitHub	2ac242f7	Remove TrainerMetrics and add CSVWriter using new StatsWriter API (#3108 )	5 年前
GitHub	0b5b1b01	Develop magic string + trajectory (#3122 ) * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * rebased with develop * Correctly calls concatBehaviorIdentifiers * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * rebased with develop * Correctly calls concatBehaviorIdentifiers * trainer_controller expects name_behavior_ids * add_policy and create_policy separated * adjusting tests to expect trainer.add_policy to be called * fixing tests * fixed naming ...	5 年前
Andrew Cohen	082789ea	Merge branch 'master' into develop-magic-string	5 年前
Andrew Cohen	b14680f1	fixing ci tests	5 年前
Ervin Teng	e577d5ea	Fix some mypy issues and remove unused code	5 年前
GitHub	bec2e8f0	Add Trajectory/Policy Queues, move Trainer logic to advance() (#3113 )	5 年前
Andrew Cohen	fc485077	fixed more ci problems/removed self.policies	5 年前
Ervin Teng	db743971	Move private methods out of trainer, simplify interface	5 年前
Andrew Cohen	c8514c18	Merge branch 'master' into develop-magic-string	5 年前
GitHub	45010af3	Add stats reporter class and re-enable missing stats (#3076 )	5 年前
Ervin Teng	3d25f9d2	Merge branch 'master' into develop-agentprocessor	5 年前
GitHub	d798b1cb	Prevent tf.Session() from eating up all the GPU memory (#3219 ) * Use soft placement and allow_growth for Session * Move config generation to tf utils * Re-add self.graph	5 年前
GitHub	56a67403	Fix lost trajectories when they are produced faster than they are consumed (#3233 ) * Fix bug when trajectories are produced faster than they are consumed * Cap max length	5 年前
Ervin Teng	29f3330f	Merge master into hotfix-0.13.1	5 年前
Ervin Teng	9ad99eb6	Combined model and policy for PPO	5 年前
GitHub	329b23e0	Fix extra summary being written when loading from checkpoint (#3272 ) * Load next summary properly * Add tests for add_policy and get_policy	5 年前
GitHub	14193ada	Self-play for symmetric games (#3194 )	5 年前
Ervin Teng	db249ceb	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
GitHub	587dd165	Support for ONNX export (#3101 )	5 年前
Ervin Teng	bcc25d59	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
GitHub	e4177de0	[change] Organize trainer files a bit better (#3538 )	5 年前
Anupam Bhatnagar	abc369a6	Adding a logging utility for improved logs	5 年前
Anupam Bhatnagar	f4dbedcf	removed extraneous logging imports and loggers	5 年前
Anupam Bhatnagar	e8e0078e	first commit	5 年前
Anupam Bhatnagar	07b15ae7	[skip-ci] small refactors	5 年前
Anupam Bhatnagar	455adc60	[skip ci] continue training until worker-0 is done	5 年前
Anupam Bhatnagar	e49f186b	removing logging statements	5 年前
Ervin Teng	ce6ab0de	Make progress bar class and add to trainer	5 年前
Ervin Teng	bcf073bf	Move console logging to ConsoleWriter	5 年前
Ervin Teng	6b578de4	Merge branch 'develop-refactorprint' into develop-progress-bar	5 年前
Ervin Teng	49df4038	Make progress bar a statswriter	5 年前
GitHub	25cc9f15	[change] Move hyperparameter printing entirely into StatsWriters (#3630 )	5 年前
GitHub	ec278616	Hotfixes for Release 0.15.1 (#3698 ) * [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting ...	5 年前
GitHub	6709a9bf	[change] Clean up trainer interface, clean up GhostTrainer stats (#3634 )	5 年前
Andrew Cohen	9f09a65d	team id centric ghost trainer	5 年前
GitHub	4ecd6ad3	Fix how we set logging levels (#3703 ) * cleanup logging * comments and cleanup * pylint, gym	5 年前
Andrew Cohen	59b88be6	Merge branch 'master' into self-play-mutex	5 年前
Anupam Bhatnagar	50e52d9c	Merge branch 'master' into distributed-training	5 年前
Andrew Cohen	3de78baa	wrapped trainer has internal policy ghost	5 年前
Andrew Cohen	3013774b	alternative to internal-policy fix	5 年前
Ervin Teng	ed06f37c	Ability to disable threading	5 年前
Anupam Bhatnagar	001fce2a	first commit	5 年前
Anupam Bhatnagar	9341f7a2	[skip-ci] small refactors	5 年前
Anupam Bhatnagar	f36108a9	[skip ci] continue training until worker-0 is done	5 年前
Anupam Bhatnagar	c49cc069	removing logging statements	5 年前
Ervin Teng	5e980ec1	Merge branch 'master' into develop-sac-apex	5 年前
Ervin Teng	9fe104d6	Make threading disable-able per trainer	5 年前
Arthur Juliani	7c3bd376	Refactoring policy and optimizer	5 年前
Arthur Juliani	212e2d1d	Merge remote-tracking branch 'origin/master' into develop-add-fire	5 年前
GitHub	232519e4	[refactor] Move output artifacts to a single results/ folder (#3829 )	5 年前
Arthur Juliani	1736559f	Combine actor and critic classes. Initial export.	5 年前
Arthur Juliani	ca887743	Support tf and pytorch alongside one another	5 年前
Arthur Juliani	89ad3020	Merge remote-tracking branch 'origin/master' into develop-add-fire # Conflicts: # ml-agents/mlagents/trainers/policy/tf_policy.py	5 年前
Christopher Goy	ba80b292	format files with pre-commit.	4 年前
GitHub	e92b4f88	[refactor] Structure configuration files into classes (#3936 )	5 年前
GitHub	09853e13	[refactor] Move checkpoint saving into trainer (#4034 )	5 年前
PSankalp Patro	45c4ea36	Save checkpoint files as .nn files in checkpoint directory	5 年前
GitHub	7229214c	[cleanup] Remove unused param keys (#4067 )	5 年前
GitHub	a1c63c4b	Release 3 Cherry-pick bug-fixes and doc changes from master (#4102 ) * [bug-fix] Fix regression in --initialize-from feature (#4086) * Fixed text in GettingStarted page specifying the logdir for tensorboard. Before it was in a directory summaries which no longer existed. Results are now saved to the results dir. (#4085) * [refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087) * Reverting bug introduced in #4071 (#4101) Co-authored-by: Scott <Scott.m.jordan91@gmail.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	5 年前
Arthur Juliani	9724c9ac	Merge master	5 年前
Jonathan Harper	80127232	Convert checkpoints to .nn format Fixed style Fixed more style Nit changes Fixed signature Convert checkpoints to .nn format Fixed style Nit changes Fixed tests, checkpoint management and style Check checkpoint management Modify statement on artifacts Nit changes Fixed signature Nit changes Fixed signature Fixed tests, checkpoint management and style Check checkpoint management Modify statement on artifacts	5 年前
Ervin Teng	510583d2	Move memory validation to settings	5 年前
GitHub	a28e2767	Update add-fire to latest master, including Policy refactor (#4263 ) * Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155...	4 年前
Ruo-Ping Dong	71fe4df6	fix formatting and test	4 年前
GitHub	3bcb029b	[refactor] Remove BrainParameters from Python code (#4138 )	4 年前
Ruo-Ping Dong	e06812aa	fix tests	4 年前
GitHub	84440f05	Convert checkpoints to .NN (#4127 ) This change adds an export to .nn for each checkpoint generated by RLTrainer and adds a NNCheckpointManager to track the generated checkpoints and final model in training_status.json. Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>	4 年前
Ruo-Ping Dong	95858e25	update saver interface and add tests	4 年前
Andrew Cohen	a65d08c7	ghost trainer tests	4 年前
GitHub	f16ce486	Update v2-staging from main (March 15) (#5123 )	4 年前

1 2 3

133 次代码提交 (20ae24dc-b21e-4343-903d-dd617110d143)