ml-agents

作者	SHA1	备注	提交日期
GitHub	8317a659	Behavioral Cloning & Trainers Reorg (#328 ) * Implement behavioral cloning for cc/dc, fc/rnn, state/observations. * Re-organize folder structure in anticipation of unitytrainers as a package. * Create demo environment BananaImitation to validate behavioral cloning. * Fixes #336	7 年前
Arthur Juliani	c3644f56	Buffer fix for properly masking gradients	7 年前
GitHub	f134016b	On Demand Decision (#308 ) * On Demand Decision : Use RequestDecision and RequestAction * New Agent Inspector : Use it to set On Demand Decision * New BrainParameters interface * LSTM memory size is now set in python * New C# API * Semantic Changes * Replaced RunMDP * New Bouncer Environment to test On Demand Dscision	7 年前
GitHub	e0d5b1b0	Fix for when not using teacher helper (#379 ) * Fix for when not using teacher helper * Rename expert to teacher throughout	7 年前
GitHub	848b8a58	Fix PPO regression (#434 ) * Fix PPO regression	7 年前
GitHub	c17937ef	Curiosity Driven Exploration & Pyramids Environments (#739 ) * Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer. * To enable, set use_curiosity flag to true in hyperparameter file. * Includes refactor of unitytrainers model code to accommodate new feature. * Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.	7 年前
GitHub	47fc38ab	Additional Tests & Bug Fixes (#854 ) * Add tests and fix for sparse tensor warning * Rename mock communicator parameter * Test longer sequences * Curiosity tests and bug fixes	7 年前
GitHub	b5722dc9	Fix for visual observation w/ curiosity (#873 )	7 年前
GitHub	6df07946	Fix for Discrete observations + Curiosity (#866 )	7 年前
Arthur Juliani	5e48766d	Remove discrete observations	7 年前
Arthur Juliani	b46b8708	Rename function	7 年前
Arthur Juliani	3659bbcd	Develop multi discrete (#1022 ) Replace discrete control with multi-discrete control.	6 年前
Deric Pang	634280a6	Fixed imports, all tests are passing.	6 年前
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	2d4b4209	Use single scope declaration for models (#1160 )	6 年前
GitHub	d2c320dd	Remove graph scope (#1205 ) * initial commit : Only works with PPO balance ball * Fix for recurrent * [Fix indentation error] * Fixed BC * Remove Dead code * Addressing comment : Removing dead code * Fixing the Pytest * edited comments * Removing GraphScope from the InternalBrain (#1227) * Documentation changes for removing graph scope (#1226) * Documentation changes * removed the keep checkpoint printing	6 年前
Ervin T	b30f4c90	Split `mlagents` into two packages (#1812 ) * Reogranize project * Fix all tests * Address comments * Delete init file * Update requirements * Tick version * Add timeout wait parameter (mlagents_envs) (#1699) * Add timeout wait param * Remove unnecessary function * Add new meta files for communicator objects * Fix all tests * update circleci * Reorganize mlagents_envs tests * WIP: test removing circleci cache * Move gym tests * Namespaced packages * Update installation instructions for separate packages * Remove unused package from setup script * Add Readme for ml-agents-envs * Clarify docs and re-comment compiler in make.bat * Add more doc to installation * Add back fix for Hololens * Recompile Protobufs * Change mlagents_envs to mlagents.envs in trainer_controller * Remove extraneous files, fix win bat script * Support Python 3.7 for envs package	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
Jonathan Harper	177ee5b8	Remove unused "last reward" logic, TF nodes At each step, an unused `last_reward` variable in the TF graph is updated in our PPO trainer. There are also related unused methods in various places in the codebase. This change removes them.	5 年前
Chris Elion	bb7773c1	add flake8 to precommit	5 年前
GitHub	be4292fb	Add different types of visual encoder (nature cnn/resnet) Add resnet and nature cnn in addition to default visual encoder	5 年前
GitHub	6225317d	refactor vis_encoder_type and add to doc refactor vis_encoder_type and add to doc	5 年前
GitHub	a9fe719c	Add Multi-GPU implementation for PPO (#2288 ) Add MultiGpuPPOPolicy class and command line options to run multi-GPU training	5 年前
GitHub	d7ebaae1	Return list instead of np array for make_mini_batch() (#2371 ) Return list instead of np array for make_mini_batch() to reduce time copying data	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
GitHub	bd7eb286	Update reward signals in parallel with policy (#2362 )	5 年前
GitHub	3683cc1c	Enable learning rate decay to be disabled (#2567 )	5 年前
GitHub	36ed3c16	Fix issue exporting graph with multi-GPU (#2573 ) Our multi-GPU training had a regression such that freezing the graph was broken. This change fixes that issue by making a few changes: * Removes the top level "tower" variable scope added by multi-GPU so that the output nodes have correct names * Removes the use of "freeze_graph" and replaces it with our own similar functionality. * Adds the "auto reuse" to network layers which require them	5 年前
Jonathan Harper	3fc14963	EXPERIMENTAL horovod support	5 年前
Jonathan Harper	47893e9c	minor tweaks	5 年前
Chris Elion	43e23941	rough pass at tf2 support, needs cleanup	5 年前
Chris Elion	806c77e4	centralize tensorflow imports	5 年前
GitHub	4da157fe	more pylint fixes (#2842 )	5 年前
Chris Elion	fca51de8	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
Chris Elion	73a346cb	cleanup	5 年前
GitHub	99981937	fix errors from new flake8-comprehensions (#2917 )	5 年前
Ervin Teng	3a4fa244	Switch to tanh squash in PPO	5 年前
Ervin Teng	b501f75b	reduce sum to do squashing properly	5 年前
Ervin Teng	35d73d1d	Split value and policy networks	5 年前
GitHub	f058b18c	Replace BrainInfos with BatchedStepResult (#3207 )	5 年前
Ervin Teng	03c750a7	Move some functionality to optimizer	5 年前
Ervin Teng	6688453b	Move some functionality to optimizer-black	5 年前
Ervin Teng	91ffde5f	More incremental steps to separation	5 年前
Ervin Teng	cd74e51b	More progress	5 年前
Ervin Teng	2373cae8	Move methods into common optimizer	5 年前
Ervin Teng	bc04f9dc	Working continuous updates	5 年前
Ervin Teng	17dc17e5	Discrete PPO working	5 年前

49 次代码提交 (eb251008-9054-4ea2-8fa0-0a9d9d2861e1)