ml-agents

作者	SHA1	备注	提交日期
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	ab5c49e8	Release v0.5 delete unityagents (#1151 ) * fixed the loggers * Modified the documentation	6 年前
GitHub	d2c320dd	Remove graph scope (#1205 ) * initial commit : Only works with PPO balance ball * Fix for recurrent * [Fix indentation error] * Fixed BC * Remove Dead code * Addressing comment : Removing dead code * Fixing the Pytest * edited comments * Removing GraphScope from the InternalBrain (#1227) * Documentation changes for removing graph scope (#1226) * Documentation changes * removed the keep checkpoint printing	6 年前
GitHub	6c354d16	New Learning Brain (#1303 ) * Initial Commit * attempt at refactor * Put all static methods into the CoreInternalBrain * improvements * more testing * modifications * renamed epsilon * misc * Now supports discrete actions * added discrete support and RNN and visual. Left to do is refactor and save variables into models * code cleaning * made a tensor generator and applier * fix on the models.py file * Moved the Checks to a different Class * Added some unit tests * BugFix * Need to generate the output tensors as well as inputs before executing the graph * Made NodeNames static and created a new namespace * Added comments to the TensorAppliers * Started adding comments on the TensorGenerators code * Added comments for the Tensor Generator * Moving the helper classes into a separate folder * Added initial comments to the TensorChecks * Renamed NodeNames -> TensorNames * Removing warnings in tests * Now using Aut...	6 年前
vincentpierre	1045b6e7	Fix continuous curriosity	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
Jonathan Harper	177ee5b8	Remove unused "last reward" logic, TF nodes At each step, an unused `last_reward` variable in the TF graph is updated in our PPO trainer. There are also related unused methods in various places in the codebase. This change removes them.	5 年前
GitHub	b05c9ac1	Add environment manager for parallel environments (#2209 ) Previously in v0.8 we added parallel environments via the SubprocessUnityEnvironment, which exposed the same abstraction as UnityEnvironment while actually wrapping many parallel environments via subprocesses. Wrapping many environments with the same interface as a single environment had some downsides, however: * Ordering needed to be preserved for agents across different envs, complicating the SubprocessEnvironment logic * Asynchronous environments with steps taken out of sync with the trainer aren't viable with the Environment abstraction This PR introduces a new EnvManager abstraction which exposes a reduced subset of the UnityEnvironment abstraction and a SubprocessEnvManager implementation which replaces the SubprocessUnityEnvironment.	5 年前
GitHub	84d9d622	python timers (#2180 ) * Timer proof-of-concept * micro optimizations * add some timers * cleanup, add asserts * Cleanup (no start/end methods) and handle exceptions * unit test and decorator * move output code, add a decorator * cleanup * module docstring * actually write the timings when done with training * use __qualname__ instead * add a few more timers * fix mock import * fix unit test * don't need fwd reference * cleanup root * always write timers, add comments * undo accidental change	5 年前
GitHub	9c50abcf	GAIL and Pretraining (#2118 ) Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes: - A new GAILRewardSignal and GAILModel for GAIL/VAIL - A BCModule component (not a reward signal) to do pretraining during RL - Documentation for both of these - Change to Demo Loader that lets you load multiple demo files in a folder - Example Demo files for all of our tested sample environments (for future regression testing)	5 年前
GitHub	a5b7cf95	Fix get_value_estimate and buffer append (#2276 ) Fixes shuffling issue with newer versions of numpy (#1798). * make get_value_estimates output a dict of floats * Use np.append instead of convert to list, unconvert * Add type hints and test for get_value_estimates	5 年前
Chris Elion	dfdf7b83	fix whitespace and line breaks	5 年前
GitHub	be4292fb	Add different types of visual encoder (nature cnn/resnet) Add resnet and nature cnn in addition to default visual encoder	5 年前
GitHub	6a212f73	Improvements for GAIL (#2296 ) * Don't 0 value bootstrap for GAIL and Curiosity * Add gradient penalties to GAN to help with stability * Add gail_config.yaml with GAIL examples * Cleaned up trainer_config.yaml and unnecessary gammas * Documentation updates * Code cleanup	5 年前
GitHub	6225317d	refactor vis_encoder_type and add to doc refactor vis_encoder_type and add to doc	5 年前
GitHub	a9fe719c	Add Multi-GPU implementation for PPO (#2288 ) Add MultiGpuPPOPolicy class and command line options to run multi-GPU training	5 年前
GitHub	d7ebaae1	Return list instead of np array for make_mini_batch() (#2371 ) Return list instead of np array for make_mini_batch() to reduce time copying data	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
GitHub	bd7eb286	Update reward signals in parallel with policy (#2362 )	5 年前
GitHub	3683cc1c	Enable learning rate decay to be disabled (#2567 )	5 年前
GitHub	832e4a47	Normalize observations when adding experiences (#2556 ) * Normalize observations when adding experiences This change moves normalization of vector observations into the trainer's "add_experiences" interface. Prior to this change, normalization occurred at inference time. This was somewhat confusing since usually executing a forward pass shouldn't have side-effects which would change the training step. Also, in a asynchronous or distributed setting where we copy the neural network weights from a trainer to a remote actor / inference worker we'd end up with training issues because of the weights being different on the trainer than the workers.	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
GitHub	cb144f20	small mypy cleanup (#2637 ) * small mypy cleanup * sac cleanup * types for ppo policy init	5 年前
Chris Elion	43e23941	rough pass at tf2 support, needs cleanup	5 年前
Chris Elion	806c77e4	centralize tensorflow imports	5 年前
Ervin Teng	12a1e306	start on tf2 policy	5 年前
Ervin Teng	e185844f	Start on TF 2 policy	5 年前
GitHub	0fe5adc2	Develop remove memories (#2795 ) * Initial commit removing memories from C# and deprecating memory fields in proto * initial changes to Python * Adding functionalities * Fixes * adding the memories to the dictionary * Fixing bugs * tweeks * Resolving bugs * Recreating the proto * Addressing comments * Passing by reference does not work. Do not merge * Fixing huge bug in Inference * Applying patches * fixing tests * Addressing comments * Renaming variable to reflect type * test	5 年前
Chris Elion	691d21e6	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
Chris Elion	73a346cb	cleanup	5 年前
Ervin Teng	987e0e3a	Merge tf2 branch	5 年前
Ervin Teng	748c250e	Somewhat running	5 年前
Ervin Teng	9dbbfd77	Somewhat running	5 年前
Ervin Teng	5e6de46f	Add normalizer	5 年前
Ervin Teng	5e1c1a00	Tweaks to Policy	5 年前
Ervin Teng	a665daed	It's mostly training	5 年前
Ervin Teng	3eb1e9c2	Pytorch port of continuous PPO	5 年前
Ervin Teng	d46b60b3	Add ReLU to the dense	5 年前
Ervin Teng	ed2c35b9	Remove some comments	5 年前
Ervin Teng	135a5bb4	Add dummy save methods	5 年前
GitHub	69d1a033	Develop remove past action communication (#2913 ) * Modifying the .proto files * attempt 1 at refactoring Python * works for ppo hallway * changing the documentation * now works with both sac and ppo both training and inference * Ned to fix the tests * TODOs : - Fix the demonstration recorder - Fix the demonstration loader - verify the intrinsic reward signals work - Fix the tests on Python - Fix the C# tests * Regenerating the protos * fix proto typo * protos and modifying the C# demo recorder * modified the demo loader * Demos are loading * IMPORTANT : THESE ARE THE FILES USED FOR CONVERSION FROM OLD TO NEW FORMAT * Modified all the demo files * Fixing all the tests * fixing ci * addressing comments * removing reference to memories in the ll-api	5 年前
Ervin Teng	437c6c2f	Add dummy save methods	5 年前
Ervin Teng	d983a636	Speed up a bit faster	5 年前
Ervin Teng	3a4fa244	Switch to tanh squash in PPO	5 年前
GitHub	681093cf	cherry pick PR#3032 (#3066 )	5 年前
Ervin Teng	9e661f0c	Looks like it's training	5 年前
Ervin Teng	eb4a04a5	Merge branch 'master' into develop-tanhsquash	5 年前
GitHub	3b4b0d55	Remove random normal epsilon (#3039 )	5 年前
Ervin Teng	f94365a2	No longer using ProcessingBuffer for PPO	5 年前
Ervin Teng	8b3b9e6c	Move trajectory and related functions to trajectory.py	5 年前
Ervin Teng	88b1123a	Merge branch 'master' of github.com:Unity-Technologies/ml-agents into develop-agentprocessor	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
Ervin Teng	c7632aa7	Fix some bugs for visual obs	5 年前
GitHub	1fa07edb	Remove Standalone Offline BC Training (#2969 )	5 年前
Ervin Teng	5ab2563b	Fixes for recurrent	5 年前
Chris Elion	fdc810ff	move (first pass)	5 年前
Ervin Teng	27c2a55b	Lots of test fixes	5 年前
Ervin Teng	97d66e71	Remove BootstrapExperience	5 年前
Ervin Teng	324d217b	Move agent_id to Trajectory	5 年前
Ervin Teng	77ff4822	Add back next_obs	5 年前
Ervin Teng	2b811fc8	Properly report value estimates and episode length	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
Ervin Teng	c330f6f6	Merge branch 'master' into develop-agentprocessor	5 年前
Ervin Teng	1bd791e5	Merge branch 'master' into develop-agentprocessor	5 年前
GitHub	45010af3	Add stats reporter class and re-enable missing stats (#3076 )	5 年前
GitHub	f058b18c	Replace BrainInfos with BatchedStepResult (#3207 )	5 年前
Ervin Teng	cd74e51b	More progress	5 年前
Ervin Teng	2b63415e	Clean up policy files	5 年前
Ervin Teng	9ad99eb6	Combined model and policy for PPO	5 年前
Ervin Teng	e912fa47	Simplify creation of optimizer, breaks multi-GPU	5 年前
Ervin Teng	164732a9	Move optimizer creation to Trainer, fix some of the reward signals	5 年前
Ervin Teng	151e3b1c	Move policy to common location, remove epsilon	5 年前
Ervin Teng	d9fe2f9c	Unified policy	5 年前
Ervin Teng	0ef40c08	SAC CC working	5 年前
Ervin Teng	1b6e175c	Fix discrete SAC and clean up policy	5 年前
Ervin Teng	edeceefd	Zeroed version of LSTM working for PPO	5 年前
Ervin Teng	649c4185	Zero out memory	5 年前
Ervin Teng	7f53bf8b	Cleanup LSTM code	5 年前
Ervin Teng	4871f49c	Fix comments for PPO	5 年前
Ervin Teng	cfc2f455	Fix BC and tests	5 年前
Ervin Teng	78671383	Move initialization call around	5 年前
Ervin Teng	cadf6603	Fix SAC CC and some reward signal tests	5 年前
GitHub	dd86e879	Separate out optimizer creation and policy graph creation (#3355 )	5 年前
Ervin Teng	1f094da9	Fix policy's scoping	5 年前
Ervin Teng	cdd57468	Re-fix scoping and add method to get all variables	5 年前
Ervin Teng	2eda5575	Fix discrete scoping	5 年前
Ervin Teng	1407db53	Fix Barracuda export for LSTM	5 年前
Ervin Teng	328476d8	Move check for creation into nn_policy	5 年前
Ervin Teng	7d5c1b0b	Add docstring and make some methods private	5 年前
Ervin Teng	441e6a0c	Add typing to optimizer, rename self.tf_optimizer	5 年前
Ervin Teng	ffdc41bb	Removed floating constants	5 年前
Ervin Teng	8abd4129	Clean up nn_policy	5 年前
Ervin Teng	7c0fa1c4	Remove action_holder placeholder	5 年前
Ervin Teng	c9fbb111	Fix entropy calculation	5 年前
Ervin Teng	be9d772e	Add option to not condition sigma on obs	5 年前
Ervin Teng	0ab7aa58	Fix tensor names	5 年前
Ervin Teng	1cfc461a	Remove and rename tf_optimizer	5 年前
Ervin Teng	63463bd1	Make TF graph seed deterministic	5 年前
Ervin Teng	14f2a7f2	Rename LearningModel to ModelUtils	5 年前
Ervin Teng	1156b9b3	Merge branch 'develop-splitpolicyoptimizer' into develop-removeactionholder	5 年前
Ervin Teng	d57124b4	Merge 'master' into develop-removeactionholder	5 年前
Ervin Teng	d6eb262c	Rename resample to reparameterize	5 年前
Ervin Teng	242e2421	Move encoder creation to separate function	5 年前
Ervin Teng	53c25fb1	Move one-hot out of policy and remove selected_actions	5 年前
Ervin Teng	a73704bc	Remove previous action from policy	5 年前

1 2 3

107 次代码提交 (1db18bd6-65c7-4859-bf35-891bb0856880)