ml-agents

作者	SHA1	备注	提交日期
GitHub	8317a659	Behavioral Cloning & Trainers Reorg (#328 ) * Implement behavioral cloning for cc/dc, fc/rnn, state/observations. * Re-organize folder structure in anticipation of unitytrainers as a package. * Create demo environment BananaImitation to validate behavioral cloning. * Fixes #336	7 年前
GitHub	e11dae1d	Python Testing & Image Inference Improvements (#353 ) * Reorganized python tests into separate folder, and make individiual test files for different (sub) modules. * Add tests for trainer_controller, PPO, and behavioral cloning. More to come soon. * Minor bug fixes discovered while writing tests. * Reworked GirdWorld to reset much faster. * Cleaned ObservationToTex and reworked GetObservationMatrixList to be 3x faster.	7 年前
eshvk	030ac5c5	[cleanup] Add a new type hint to call a dictionary of BrainInfo objects as an AllBrainInfo. Propagate this hint to all methods. Some pep8 cleanups.	7 年前
GitHub	9ad4182e	Merge pull request #366 from Unity-Technologies/feature/cleanup [cleanup] Add a new type hint to call a dictionary of BrainInfo objects as an AllBrainInfo. Propagate this hint to all methods. Some pep8 cleanups.	7 年前
Arthur Juliani	c3644f56	Buffer fix for properly masking gradients	7 年前
GitHub	f8d27dc5	Merge branch 'development-0.3' into feature/LSTM2	7 年前
GitHub	2bba53b8	Merge pull request #367 from Unity-Technologies/feature/LSTM2 Hallway & LSTM Improvements	7 年前
GitHub	99103b29	Use `curr_brain_info`	7 年前
GitHub	f134016b	On Demand Decision (#308 ) * On Demand Decision : Use RequestDecision and RequestAction * New Agent Inspector : Use it to set On Demand Decision * New BrainParameters interface * LSTM memory size is now set in python * New C# API * Semantic Changes * Replaced RunMDP * New Bouncer Environment to test On Demand Dscision	7 年前
GitHub	dcf58f75	Feature/previous text action (#375 ) * [Previous Text Actions] Renamed previous_action to previous_vector_action added previous_text_action to the BrainInfo * [Semantics] Carried the modifications to the semantics of previous_vector_action to the trainers	7 年前
GitHub	e0d5b1b0	Fix for when not using teacher helper (#379 ) * Fix for when not using teacher helper * Rename expert to teacher throughout	7 年前
GitHub	a7c9096f	[Semantics] Modified the placeholder names (#381 )	7 年前
Vincent Gao	02df3b34	resolved conflicts	7 年前
GitHub	5bdef358	[Fix] Must take mean of entropy to avoid errors what number of agents change during training (#407 )	7 年前
Marwan Mattar	ba6911c3	Merge branch 'development-0.3' into dev-api-doc-academy # Conflicts: # unity-environment/Assets/ML-Agents/Editor/MLAgentsEditModeTest.cs # unity-environment/Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs # unity-environment/Assets/ML-Agents/Scripts/Academy.cs	7 年前
GitHub	848b8a58	Fix PPO regression (#434 ) * Fix PPO regression	7 年前
Joe Ward	9163a54a	resolved merge conflict with dev-0.3 branch	7 年前
vincentpierre	e5a59e9b	[Refactor] renamed is_continuous to is_continuous_action and added is_continuous_observation to decrease confusion	7 年前
eshvk	2d2eb64b	[containers] Enables container support for scenes that use visual observations	7 年前
GitHub	74064891	Merge pull request #520 from Unity-Technologies/feature-trainer-ppo-is-continuous Feature trainer ppo is continuous	7 年前
GitHub	e43c069e	Merge pull request #547 from Unity-Technologies/develop-feature-docker-improvements [containers] Enables container support for scenes that use visual obsvervations	7 年前
GitHub	237b41f9	Hotfix 0.3.0c (#618 ) Fixes the following issues: * Missing component reference in BananaRL environment. * Neural Network for multiple visual observations was not properly generated. * Episode time-out value estimate bootstrapping used incorrect observation as input.	7 年前
GitHub	78d411f6	Merge pull request #619 from Unity-Technologies/develop Release v0.3.1	7 年前
GitHub	1a449e98	Hotfix 0.3.1b (#637 ) * [Fix] Use the stored agent info instead of the previous agent info when bootstraping the value * [Bug Fix] Addressed #643 * [Added Line Break]	7 年前
vincentpierre	076c8744	Report means instead of totals for losses (#580 ) * Report means instead of totals for losses. * Report absolute loss for policy.	7 年前
GitHub	b2675216	Hotfix 0.3.1b (#656 ) * [Fix] Use the stored agent info instead of the previous agent info when bootstraping the value * [Bug Fix] Addressed #643 * [Added Line Break]	7 年前
GitHub	755be43e	[Cold Fix] Making the episode length and mean reward more accurate for the first episode (#657 )	7 年前
GitHub	3b866e9f	Use Clipped Gaussian (#649 ) This PR makes the following changes: * Moves clipping of continuous control model into model itself. Output is now always [-1, 1]. * Internal model values are now clipped between [-3, 3] before being rescaled to [-1, 1] for output. * This improves training performance by providing a wider range of values within which the pdf of the gaussian can fall. Output of [-1, 1] is used to be more environment-creator friendly. * Fixes issue where epsilon was erroneously being used to reconstruct old probabilities during PPO update, leading to reduced learning performance. * Introduce ScaleAction() function within python to easily rescale values from [-1, 1] to arbitrary range. * Re-train all CC models using improved algorithm. All performance levels are equal or improved. In the case of Crawler, improvement is drastic. * Update documentation appropriately. * Made miscellaneous minor code style and optimization improvements within environments.	7 年前
Arthur Juliani	9477eaa9	Develop fix cumulative reward (#725 ) * [Cold Fix] Split the way cummulative rewards and episode length are counted The reward is appended at each step to the cummulative reward The episode count is ONLY incremented when d_t+1 is false	7 年前
GitHub	702d98c6	[Fix] The summary writer is now implemented in the abtract trainer class. (#806 ) Summary writer now displays {}: Step: {}. No episode was completed since last summary. when there was no completed episodes	7 年前
GitHub	c17937ef	Curiosity Driven Exploration & Pyramids Environments (#739 ) * Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer. * To enable, set use_curiosity flag to true in hyperparameter file. * Includes refactor of unitytrainers model code to accommodate new feature. * Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.	7 年前
vincentpierre	a22c0f65	[fixing encoding_size]	7 年前
Arthur Juliani	d7338050	Enable concurrent sessions	7 年前
Arthur Juliani	5d402be9	Minor Optimizations (#836 )	7 年前
GitHub	8526dcfc	Fix for visual observations (#847 )	7 年前
GitHub	0f65e272	[Addresses #842 ] (#849 ) In the case the agent is done imediately after spawning, its stats are empty because the stats need at least 2 successive experieces to create the stats. By specifying the default value of 0, the error does no longer appear	7 年前
GitHub	47fc38ab	Additional Tests & Bug Fixes (#854 ) * Add tests and fix for sparse tensor warning * Rename mock communicator parameter * Test longer sequences * Curiosity tests and bug fixes	7 年前
GitHub	6e6e8d96	Fix for CC models w/ RNN and Curiosity (#860 )	7 年前
GitHub	b5722dc9	Fix for visual observation w/ curiosity (#873 )	7 年前
vincentpierre	4c6439d5	[Attempted fix]	7 年前
GitHub	6df07946	Fix for Discrete observations + Curiosity (#866 )	7 年前
GitHub	68d6170f	Error message when using ODD and Curiosity (#883 ) * Remove extra bouncer brain hyperparameters * Add error when using curiosity+odd	7 年前
GitHub	bf858cd6	Merge pull request #884 from Unity-Technologies/release-v0.4 Release v0.4	7 年前
GitHub	4b3c6c9f	Merge pull request #885 from Unity-Technologies/release-v0.4 Release v0.4	7 年前
Arthur Juliani	5e48766d	Remove discrete observations	7 年前
Arthur Juliani	b46b8708	Rename function	7 年前
GitHub	b6fe0bca	Merge pull request #906 from Unity-Technologies/develop-no-discrete-obs Remove Discrete Observations	6 年前
Arthur Juliani	195ac934	Merge branch 'develop' into develop-runs # Conflicts: # python/learn.py # python/unitytrainers/trainer.py	6 年前
vincentpierre	e47cec56	[Initial Commit]	6 年前
unityjeffrey	0d67f311	changed ml agents to ml-agents	6 年前
unityjeffrey	19fb437a	changed to Unity ML-Agents Toolkit (english)	6 年前
GitHub	7b9a2905	Merge pull request #916 from Unity-Technologies/hotfix-trademarkupdate update for trademark and consistency of ml-agents	6 年前
Arthur Juliani	9701c3db	Merge branch 'hotfix-0' into release-v0.4-fix-curiosity-odd # Conflicts: # python/unitytrainers/ppo/trainer.py	6 年前
Arthur Juliani	0c6411c2	Use switch between old and new behavior	6 年前
Arthur Juliani	1bfbf67a	Simplify approach	6 年前
Arthur Juliani	cfb7cfef	Code clean-up	6 年前
Arthur Juliani	083cbff5	Add to docstring	6 年前
Arthur Juliani	c31f63b5	Fix typo	6 年前
GitHub	3b5af6b2	Merge pull request #937 from Unity-Technologies/release-v0.4-fix-curiosity-odd Hotfix - Curiosity & ODD	6 年前
GitHub	f155d661	Merge pull request #908 from Unity-Technologies/hotfix-0 Release v0.4.0a	6 年前
GitHub	e50ac7ae	Merge branch 'develop' into hotfix-0	6 年前
GitHub	b36e6a2e	Merge pull request #946 from Unity-Technologies/hotfix-0 v0.4.0a into Develop	6 年前
Deric Pang	8380f2f2	Moved curriculum code out of environment code.	6 年前
Deric Pang	ae944381	Removing print statements.	6 年前
Deric Pang	798c8bf9	Removing print statements.	6 年前
GitHub	2d715dc5	Revert "Release v0.5 (#1202 )" (#1221 ) This reverts commit 983c4029cb435fc7ad27a796e79a1d59904e53e5.	6 年前
GitHub	4e73f770	Merge branch 'develop' into hotfix-0.4b	6 年前
Arthur Juliani	1eb701af	Merge remote-tracking branch 'origin/develop' into develop-value-estimates-ppo	6 年前
Arthur Juliani	f52d5a92	Merge remote-tracking branch 'origin/develop' into develop-runs	6 年前
GitHub	1e21c143	Merge pull request #934 from Unity-Technologies/develop-value-estimates-ppo Develop value estimates ppo	6 年前
GitHub	ef3025e6	Merge pull request #1004 from Unity-Technologies/develop-runs Enable multiple runs in learn.py	6 年前
GitHub	7d0990cf	Fix MultiBrain bug that was introduced with the value estimates (#1018 )	6 年前
Arthur Juliani	52865022	[Fix bug 1040] (#1062 )	6 年前
Deric Pang	6eba6940	Merge remote-tracking branch 'upstream/develop' into develop-trainer-controller-cleanup	6 年前
Arthur Juliani	3659bbcd	Develop multi discrete (#1022 ) Replace discrete control with multi-discrete control.	6 年前
Arthur Juliani	fee02a84	Attempted fix for #1059 (#1089 )	6 年前
Deric Pang	634280a6	Fixed imports, all tests are passing.	6 年前
Arthur Juliani	17224292	Fix for Curiosity with ODD (#1107 ) This branch addresses the issue referenced in #1059	6 年前
GitHub	ded0d8c7	Develop action masking (#1080 ) * [Initial Commit] Modified the model.py file and the ppo/trainer.py file to use masked actions * Preliminary modifications to the python side of the code to enable action masking * Preliminary modifications to the C# side of the code to enable action masking * Preliminary modifications to the communication side of the code to enable action masking * Implemented action masking for BC Note : The actions of the teacher are not masked * More error messages for the action masking * fix pytests * Added Documentation * Address comment * Addressed Comments on docs * Addressed second comment on docs * Addressed comments for the python side of the code * Created the action masker and associated unit tests * Addressed comments on the C# side * Addressed the comment regarding action_masking_name * Addressed the comments	6 年前
Deric Pang	e55b1764	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
Deric Pang	e0e02ae6	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
Deric Pang	cdb41480	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
GitHub	3900ed66	Merge pull request #1083 from Unity-Technologies/develop-flat-code-restructure ML-Agents Code Restructure	6 年前
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	f8df71a0	Revert "Release v0.5 (Develop) (#1203 )" (#1222 ) This reverts commit 448aac65dc891bad04a23a02d275f6a1d2704e1e.	6 年前
GitHub	ab5c49e8	Release v0.5 delete unityagents (#1151 ) * fixed the loggers * Modified the documentation	6 年前
GitHub	2d4b4209	Use single scope declaration for models (#1160 )	6 年前
GitHub	29084e77	Curriculum learning reward thresholding bug fix (#1141 )	6 年前
GitHub	25495874	Merge pull request #1223 from Unity-Technologies/release-v0.5 Release v0.5	6 年前
GitHub	560f1bd7	Merge pull request #1224 from Unity-Technologies/release-v0.5 Release v0.5	6 年前
GitHub	d2c320dd	Remove graph scope (#1205 ) * initial commit : Only works with PPO balance ball * Fix for recurrent * [Fix indentation error] * Fixed BC * Remove Dead code * Addressing comment : Removing dead code * Fixing the Pytest * edited comments * Removing GraphScope from the InternalBrain (#1227) * Documentation changes for removing graph scope (#1226) * Documentation changes * removed the keep checkpoint printing	6 年前
GitHub	3c9603d6	Demonstration Recorder (#1240 )	6 年前
GitHub	840417ff	Use organized tags for tensorboard stats (#1248 )	6 年前
GitHub	6c354d16	New Learning Brain (#1303 ) * Initial Commit * attempt at refactor * Put all static methods into the CoreInternalBrain * improvements * more testing * modifications * renamed epsilon * misc * Now supports discrete actions * added discrete support and RNN and visual. Left to do is refactor and save variables into models * code cleaning * made a tensor generator and applier * fix on the models.py file * Moved the Checks to a different Class * Added some unit tests * BugFix * Need to generate the output tensors as well as inputs before executing the graph * Made NodeNames static and created a new namespace * Added comments to the TensorAppliers * Started adding comments on the TensorGenerators code * Added comments for the Tensor Generator * Moving the helper classes into a separate folder * Added initial comments to the TensorChecks * Renamed NodeNames -> TensorNames * Removing warnings in tests * Now using Aut...	6 年前
vincentpierre	1045b6e7	Fix continuous curriosity	6 年前
GitHub	547f0e98	Merge pull request #1361 from Unity-Technologies/release-v0.6 Merge Release v0.6 into develop	6 年前
GitHub	a196dde2	Merge pull request #1494 from Unity-Technologies/release-v0.6 v0.6 Release	6 年前
GitHub	b6c97cb6	Fix for divide-by-zero error with Discrete Actions (#1520 ) * Enable buffer padding to be set other than 0 Allows buffer padding in AgentBufferField to be set to a custom value. In particular, 0-padding for `action_masks` causes a divide-by-zero error, and should be padded with 1’s instead. This is done as a parameter passed to the `append` method, so that the pad value can be set right after the instantiation of an AgentBufferField.	6 年前
GitHub	8b1f0a38	Merge pull request #1589 from Unity-Technologies/hotfix-0.6.0a Hotfix 0.6.0a to develop	6 年前
GitHub	c0c289cc	Merge pull request #1588 from Unity-Technologies/hotfix-0.6.0a Hotfix 0.6.0a to master	6 年前
GitHub	c258b1c3	Move 'take_action' into Policy class (#1669 ) * Move 'take_action' into Policy class This refactor is part of Actor-Trainer separation. Since policies will be distributed across actors in separate processes which share a single trainer, taking an action should be the responsibility of the policy. This change makes a few smaller changes: * Combines `take_action` logic between trainers, making it more generic * Adds an `ActionInfo` data class to be more explicit about the data returned by the policy, only used by TrainerController and policy for now. * Moves trainer stats logic out of `take_action` and into `add_experiences` * Renames 'take_action' to 'get_action'	6 年前
Ervin T	b30f4c90	Split `mlagents` into two packages (#1812 ) * Reogranize project * Fix all tests * Address comments * Delete init file * Update requirements * Tick version * Add timeout wait parameter (mlagents_envs) (#1699) * Add timeout wait param * Remove unnecessary function * Add new meta files for communicator objects * Fix all tests * update circleci * Reorganize mlagents_envs tests * WIP: test removing circleci cache * Move gym tests * Namespaced packages * Update installation instructions for separate packages * Remove unused package from setup script * Add Readme for ml-agents-envs * Clarify docs and re-comment compiler in make.bat * Add more doc to installation * Add back fix for Hololens * Recompile Protobufs * Change mlagents_envs to mlagents.envs in trainer_controller * Remove extraneous files, fix win bat script * Support Python 3.7 for envs package	6 年前
eshvk	cc9bdf17	Added logging per Brain of time to update policy, time elapsed during training, time to collect experiences, buffer length, average return	6 年前
eshvk	fb04c40c	Reorganize to make metrics collection more accurate	6 年前
GitHub	a0b44f1b	Merge pull request #1858 from Unity-Technologies/develop-esh-metrics Added logging per Brain of time to update policy, time elapsed during training, time to collect experiences, buffer length, average return per policy	6 年前
GitHub	93760bc4	Adds SubprocessUnityEnvironment for parallel envs (#1751 ) This commit adds support for running Unity environments in parallel. An abstract base class was created for UnityEnvironment which a new SubprocessUnityEnvironment inherits from. SubprocessUnityEnvironment communicates through a pipe in order to send commands which will be run in parallel to its workers. A few significant changes needed to be made as a side-effect: * UnityEnvironments are created via a factory method (a closure) rather than being directly created by the main process. * In mlagents-learn "worker-id" has been replaced by "base-port" and "num-envs", and worker_ids are automatically assigned across runs. * BrainInfo objects now convert all fields to numpy arrays or lists to avoid serialization issues.	6 年前
GitHub	2d1bda57	Merge pull request #1931 from Unity-Technologies/release-v0.8 Release v0.8	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
GitHub	70d14910	Merge pull request #1934 from Unity-Technologies/develop-black Black formatting	6 年前
Vincent(Yuan) Gao	a15763f8	Clear cumulative_returns_since_policy_update (#2120 ) Before the CSV file's mean rewards would lag much behind the rest of the code since this buffer was never cleared.	6 年前
GitHub	a4d5b2d3	Doc/comment cleanup - Fix some occurrences of 'the the' (#2119 )	6 年前
GitHub	d5f6b7f8	Merge pull request #2157 from Unity-Technologies/release-v0.8.2 Release v0.8.2	5 年前
GitHub	2671e1a0	Enable mypy in precommit checks (#2177 ) * WIP precommit on top level * update CI * circleci fixes * intentionally fail black * use --show-diff-on-failure in CI * fix command order * rebreak a file * apply black * WIP enable mypy * run mypy on each package * fix trainer_metrics mypy errors * more mypy errors * more mypy * Fix some partially typed functions * types for take_action_outputs * fix formatting * cleanup * generate stubs for proto objects * fix ml-agents-env mypy errors * disallow-incomplete-defs for gym-unity * Add CI notes to CONTRIBUTING.md	5 年前
GitHub	40c7fc48	Merge branch 'develop' into protobuf_update	5 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
Jonathan Harper	177ee5b8	Remove unused "last reward" logic, TF nodes At each step, an unused `last_reward` variable in the TF graph is updated in our PPO trainer. There are also related unused methods in various places in the codebase. This change removes them.	5 年前
GitHub	b05c9ac1	Add environment manager for parallel environments (#2209 ) Previously in v0.8 we added parallel environments via the SubprocessUnityEnvironment, which exposed the same abstraction as UnityEnvironment while actually wrapping many parallel environments via subprocesses. Wrapping many environments with the same interface as a single environment had some downsides, however: * Ordering needed to be preserved for agents across different envs, complicating the SubprocessEnvironment logic * Asynchronous environments with steps taken out of sync with the trainer aren't viable with the Environment abstraction This PR introduces a new EnvManager abstraction which exposes a reduced subset of the UnityEnvironment abstraction and a SubprocessEnvManager implementation which replaces the SubprocessUnityEnvironment.	5 年前
Chris Elion	bb7773c1	add flake8 to precommit	5 年前
GitHub	84d9d622	python timers (#2180 ) * Timer proof-of-concept * micro optimizations * add some timers * cleanup, add asserts * Cleanup (no start/end methods) and handle exceptions * unit test and decorator * move output code, add a decorator * cleanup * module docstring * actually write the timings when done with training * use __qualname__ instead * add a few more timers * fix mock import * fix unit test * don't need fwd reference * cleanup root * always write timers, add comments * undo accidental change	5 年前
GitHub	9c50abcf	GAIL and Pretraining (#2118 ) Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes: - A new GAILRewardSignal and GAILModel for GAIL/VAIL - A BCModule component (not a reward signal) to do pretraining during RL - Documentation for both of these - Change to Demo Loader that lets you load multiple demo files in a folder - Example Demo files for all of our tested sample environments (for future regression testing)	5 年前
GitHub	1c18bd18	Swap 0 set and reward buffer append (#2273 ) Fix bug with reward_buffer always 0	5 年前
GitHub	a5b7cf95	Fix get_value_estimate and buffer append (#2276 ) Fixes shuffling issue with newer versions of numpy (#1798). * make get_value_estimates output a dict of floats * Use np.append instead of convert to list, unconvert * Add type hints and test for get_value_estimates	5 年前
Chris Elion	5d07ca1f	Merge remote-tracking branch 'origin/develop' into enable-flake8	5 年前
Chris Elion	dfdf7b83	fix whitespace and line breaks	5 年前
GitHub	f8041534	Merge pull request #2236 from Unity-Technologies/enable-flake8 Enable flake8	5 年前
GitHub	be4292fb	Add different types of visual encoder (nature cnn/resnet) Add resnet and nature cnn in addition to default visual encoder	5 年前
GitHub	6a212f73	Improvements for GAIL (#2296 ) * Don't 0 value bootstrap for GAIL and Curiosity * Add gradient penalties to GAN to help with stability * Add gail_config.yaml with GAIL examples * Cleaned up trainer_config.yaml and unnecessary gammas * Documentation updates * Code cleanup	5 年前
GitHub	9eb3f049	Cleanup unused code in TrainerController (#2315 ) * Removes unused SubprocessEnvManager import in trainer_controller * Removes unused `steps` argument to `TrainerController._save_model` * Consolidates unnecessary branching for curricula in `TrainerController.advance` * Moves `reward_buffer` into `TFPolicy` from `PPOPolicy` and adds `BCTrainer` support so that we don't have a broken interface / undefined behavior when BCTrainer is used with curricula.	5 年前
GitHub	6225317d	refactor vis_encoder_type and add to doc refactor vis_encoder_type and add to doc	5 年前
GitHub	53475207	Merge pull request #2380 from Unity-Technologies/release-0.9.0 Release v0.9.0	5 年前
GitHub	a9fe719c	Add Multi-GPU implementation for PPO (#2288 ) Add MultiGpuPPOPolicy class and command line options to run multi-GPU training	5 年前
GitHub	d7ebaae1	Return list instead of np array for make_mini_batch() (#2371 ) Return list instead of np array for make_mini_batch() to reduce time copying data	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
Ervin Teng	072d2ef8	Merge latest develop	5 年前
GitHub	bd7eb286	Update reward signals in parallel with policy (#2362 )	5 年前
GitHub	689765d6	Modification of reward signals and rl_trainer for SAC (#2433 ) * Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo. * Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals. * Moves end_episode to rl_trainer * Fixed bug with BCModule with RNN	5 年前
GitHub	43696d60	Fix bug in add_rewards_output and add test (#2442 )	5 年前
GitHub	0a163871	Merge pull request #2469 from Unity-Technologies/release-0.9.2 Release 0.9.2	5 年前
GitHub	3683cc1c	Enable learning rate decay to be disabled (#2567 )	5 年前
GitHub	832e4a47	Normalize observations when adding experiences (#2556 ) * Normalize observations when adding experiences This change moves normalization of vector observations into the trainer's "add_experiences" interface. Prior to this change, normalization occurred at inference time. This was somewhat confusing since usually executing a forward pass shouldn't have side-effects which would change the training step. Also, in a asynchronous or distributed setting where we copy the neural network weights from a trainer to a remote actor / inference worker we'd end up with training issues because of the weights being different on the trainer than the workers.	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
GitHub	36ed3c16	Fix issue exporting graph with multi-GPU (#2573 ) Our multi-GPU training had a regression such that freezing the graph was broken. This change fixes that issue by making a few changes: * Removes the top level "tower" variable scope added by multi-GPU so that the output nodes have correct names * Removes the use of "freeze_graph" and replaces it with our own similar functionality. * Adds the "auto reuse" to network layers which require them	5 年前
GitHub	cb144f20	small mypy cleanup (#2637 ) * small mypy cleanup * sac cleanup * types for ppo policy init	5 年前
Jonathan Harper	3fc14963	EXPERIMENTAL horovod support	5 年前
Jonathan Harper	47893e9c	minor tweaks	5 年前
GitHub	b2fa2268	Merge pull request #2648 from Unity-Technologies/release-0.10.0 Release 0.10.0	5 年前
GitHub	8e931d8d	Merge branch 'develop' into release-0.10.0	5 年前
Ervin Teng	094cbe4d	Fix bug when batch size is a non-multiple of sequence length (#2661 )	5 年前
Anupam Bhatnagar	cc208c00	resolving conflicts	5 年前
GitHub	b2a2047e	Fix bug when batch size is a non-multiple of sequence length (#2661 )	5 年前
Chris Elion	43e23941	rough pass at tf2 support, needs cleanup	5 年前
Ervin Teng	024e3677	small mypy cleanup (#2637 ) * small mypy cleanup * sac cleanup * types for ppo policy init	5 年前
Chris Elion	806c77e4	centralize tensorflow imports	5 年前
GitHub	f22c41db	Merge pull request #2704 from Unity-Technologies/hotfix-0.10.1 Merge Hotfix 0.10.1	5 年前
Anupam Bhatnagar	b733b34c	resolving conflicts	5 年前
Chris Elion	a1967c19	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
GitHub	5d3e05d1	Fix "memory leak" during inference (#2722 ) * Clear buffer if not training * Add tests	5 年前
Ervin Teng	12a1e306	start on tf2 policy	5 年前
Ervin Teng	e185844f	Start on TF 2 policy	5 年前
Chris Elion	3d8a70fb	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
GitHub	0fe5adc2	Develop remove memories (#2795 ) * Initial commit removing memories from C# and deprecating memory fields in proto * initial changes to Python * Adding functionalities * Fixes * adding the memories to the dictionary * Fixing bugs * tweeks * Resolving bugs * Recreating the proto * Addressing comments * Passing by reference does not work. Do not merge * Fixing huge bug in Inference * Applying patches * fixing tests * Addressing comments * Renaming variable to reflect type * test	5 年前
GitHub	495873e5	Merge pull request #2833 from Unity-Technologies/release-0.11.0 Release 0.11.0	5 年前
Chris Elion	691d21e6	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
GitHub	c6c01a03	Enable pylint and fix a few things (#2767 ) * enable pylint, disable some messages and fix a few * SAC memories in init	5 年前
Jonathan Harper	8550679d	Merge branch 'develop' into release-0.11.0	5 年前
GitHub	4da157fe	more pylint fixes (#2842 )	5 年前
Chris Elion	fca51de8	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
GitHub	bf68edcf	ingore attribute-defined-outside-init in multi_gpu_policy (#2876 )	5 年前
Chris Elion	73a346cb	cleanup	5 年前
GitHub	f57b7ac6	Allow usage with tensorflow 2.0.0 (via tf.compat.v1) (#2665 )	5 年前
Chris Elion	7353ad22	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
Ervin Teng	987e0e3a	Merge tf2 branch	5 年前
Ervin Teng	748c250e	Somewhat running	5 年前
Andrew Cohen	13fe9cf8	Bubbled up indexing of AllBrainInfo to trainer controller from trainers	5 年前
Ervin Teng	9dbbfd77	Somewhat running	5 年前
Ervin Teng	5e6de46f	Add normalizer	5 年前
GitHub	c0453ae1	Merge pull request #2912 from Unity-Technologies/develop-allbraininfo Bubbled up indexing of AllBrainInfo to trainer controller from trainers	5 年前
Ervin Teng	5e1c1a00	Tweaks to Policy	5 年前
GitHub	99981937	fix errors from new flake8-comprehensions (#2917 )	5 年前
Ervin Teng	a665daed	It's mostly training	5 年前
Ervin Teng	3eb1e9c2	Pytorch port of continuous PPO	5 年前
Ervin Teng	d46b60b3	Add ReLU to the dense	5 年前
Ervin Teng	ed2c35b9	Remove some comments	5 年前
Ervin Teng	135a5bb4	Add dummy save methods	5 年前
GitHub	69d1a033	Develop remove past action communication (#2913 ) * Modifying the .proto files * attempt 1 at refactoring Python * works for ppo hallway * changing the documentation * now works with both sac and ppo both training and inference * Ned to fix the tests * TODOs : - Fix the demonstration recorder - Fix the demonstration loader - verify the intrinsic reward signals work - Fix the tests on Python - Fix the C# tests * Regenerating the protos * fix proto typo * protos and modifying the C# demo recorder * modified the demo loader * Demos are loading * IMPORTANT : THESE ARE THE FILES USED FOR CONVERSION FROM OLD TO NEW FORMAT * Modified all the demo files * Fixing all the tests * fixing ci * addressing comments * removing reference to memories in the ll-api	5 年前
Andrew Cohen	e96b80db	recieves brain_name and identifier on python side	5 年前
Ervin Teng	437c6c2f	Add dummy save methods	5 年前
Ervin Teng	d983a636	Speed up a bit faster	5 年前
Ervin Teng	54644477	Merge branch 'develop' of github.com:Unity-Technologies/ml-agents into develop-nomaxstep-test	5 年前
Ervin Teng	df5ee7bf	Split buffer into two buffers (PPO works)	5 年前
Ervin Teng	3a4fa244	Switch to tanh squash in PPO	5 年前
Ervin Teng	fd0647a6	Rename append_update_buffer to append_to_update_buffer	5 年前
Andrew Cohen	bd056007	recieves brain_name and identifier on python side	5 年前
GitHub	d4780a55	Merge pull request #3010 from Unity-Technologies/release-0.12.0-to-master Merge Release 0.12.0 to master	5 年前
GitHub	652488d9	check for numpy float64 (#2948 )	5 年前
GitHub	681093cf	cherry pick PR#3032 (#3066 )	5 年前
GitHub	213cd68d	Split Buffer into processing and update buffers (#2964 ) This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps: - Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR) - Move the processing buffer and add/process experiences into a separate, outside class - Change the data type of the update buffer to be a Trajectory - Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers	5 年前
Ervin Teng	34f9577c	Merge branch 'develop' into develop-agentprocessor	5 年前
Ervin Teng	2c9376bc	Convert to trajectory	5 年前
Ervin Teng	9e661f0c	Looks like it's training	5 年前
GitHub	ef2514ba	Develop cold fix recurrent (#3032 ) * Fixing the value estimate with recurrent * fix typing * Fix type check	5 年前
GitHub	35c995e9	Merge pull request #3038 from Unity-Technologies/develop Merge develop to master	5 年前
Ervin Teng	a97ffb47	Attempt reward reporting	5 年前
Ervin Teng	9c5fdd31	Stats reporting is working	5 年前
Ervin Teng	eb4a04a5	Merge branch 'master' into develop-tanhsquash	5 年前
GitHub	3b4b0d55	Remove random normal epsilon (#3039 )	5 年前
Ervin Teng	e0e57188	Clean up some stuff	5 年前
Ervin Teng	b501f75b	reduce sum to do squashing properly	5 年前
Andrew Cohen	5097bcc0	recieves brain_name and identifier on python side	5 年前
Ervin Teng	f94365a2	No longer using ProcessingBuffer for PPO	5 年前
Ervin Teng	8b3b9e6c	Move trajectory and related functions to trajectory.py	5 年前
Ervin Teng	76abf968	Add back max_step logic	5 年前
Ervin Teng	88b1123a	Merge branch 'master' of github.com:Unity-Technologies/ml-agents into develop-agentprocessor	5 年前
Andrew Cohen	8578b0b7	add_policy and create_policy separated	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
Ervin Teng	c9116ed2	Move some common logic to buffer class	5 年前
GitHub	90db165f	Add --namespace-packages to mypy for mlagents (#3075 )	5 年前
Ervin Teng	c7632aa7	Fix some bugs for visual obs	5 年前
GitHub	1fa07edb	Remove Standalone Offline BC Training (#2969 )	5 年前
Andrew Cohen	614d276f	recieves brain_name and identifier on python side	5 年前
Ervin Teng	5ab2563b	Fixes for recurrent	5 年前
Andrew Cohen	96922f84	recieves brain_name and identifier on python side	5 年前
Chris Elion	fdc810ff	move (first pass)	5 年前
GitHub	58b6c7c2	Rename mlagents.envs to mlagents_envs (#3083 )	5 年前
Ervin Teng	27c2a55b	Lots of test fixes	5 年前
Ervin Teng	97d66e71	Remove BootstrapExperience	5 年前
Ervin Teng	324d217b	Move agent_id to Trajectory	5 年前
Ervin Teng	77ff4822	Add back next_obs	5 年前
Andrew Cohen	d1edbf43	add_policy and create_policy separated	5 年前
Ervin Teng	2b811fc8	Properly report value estimates and episode length	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
Ervin Teng	c330f6f6	Merge branch 'master' into develop-agentprocessor	5 年前
Andrew Cohen	de902fbb	passes all pytest and C# tests	5 年前
GitHub	2ac242f7	Remove TrainerMetrics and add CSVWriter using new StatsWriter API (#3108 )	5 年前
Ervin Teng	fdf9aea7	Make conversion methods part of NamedTuples	5 年前
Ervin Teng	6242b67d	Add way to check if trajectory is done or max_reached	5 年前
GitHub	0b5b1b01	Develop magic string + trajectory (#3122 ) * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * rebased with develop * Correctly calls concatBehaviorIdentifiers * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * rebased with develop * Correctly calls concatBehaviorIdentifiers * trainer_controller expects name_behavior_ids * add_policy and create_policy separated * adjusting tests to expect trainer.add_policy to be called * fixing tests * fixed naming ...	5 年前
GitHub	c7da0139	Fix mypy errors in trainer code. (#3135 )	5 年前
Andrew Cohen	082789ea	Merge branch 'master' into develop-magic-string	5 年前
Andrew Cohen	6a4e7cf9	added ppo/sac_policy attributes to keep up with master	5 年前
Ervin Teng	1bd791e5	Merge branch 'master' into develop-agentprocessor	5 年前
Andrew Cohen	3e76adbd	fixing more ci tests	5 年前
Ervin Teng	e577d5ea	Fix some mypy issues and remove unused code	5 年前
Andrew Cohen	c3a92afa	fixing ci ppo_policy	5 年前
Ervin Teng	9e0ef912	Fixed value estimate bug	5 年前
GitHub	bec2e8f0	Add Trajectory/Policy Queues, move Trainer logic to advance() (#3113 )	5 年前
Ervin Teng	db743971	Move private methods out of trainer, simplify interface	5 年前
Andrew Cohen	c8514c18	Merge branch 'master' into develop-magic-string	5 年前
GitHub	45010af3	Add stats reporter class and re-enable missing stats (#3076 )	5 年前
Ervin Teng	b3a4e641	Remove some vestigial code	5 年前
Ervin Teng	48793ec1	Fix test	5 年前
Ervin Teng	3d25f9d2	Merge branch 'master' into develop-agentprocessor	5 年前
GitHub	5bc7531b	Get step from policy (#3223 )	5 年前
GitHub	d985dded	Merge branch 'master' into merge-release-0.13.0	5 年前
Ervin Teng	35d73d1d	Split value and policy networks	5 年前
GitHub	f058b18c	Replace BrainInfos with BatchedStepResult (#3207 )	5 年前
Ervin Teng	03c750a7	Move some functionality to optimizer	5 年前
Ervin Teng	2c1ef594	Move some functionality to optimizer-black	5 年前
Ervin Teng	6688453b	Move some functionality to optimizer-black	5 年前
Ervin Teng	91ffde5f	More incremental steps to separation	5 年前
Ervin Teng	cd74e51b	More progress	5 年前
Ervin Teng	2373cae8	Move methods into common optimizer	5 年前
Ervin Teng	76ad64d7	Some more bugfixes	5 年前
Ervin Teng	bc04f9dc	Working continuous updates	5 年前
Ervin Teng	29f3330f	Merge master into hotfix-0.13.1	5 年前
Ervin Teng	17dc17e5	Discrete PPO working	5 年前
GitHub	d52fb483	Merge pull request #3264 from Unity-Technologies/hotfix-0.13.1 Merge hotfix 0.13.1 into master	5 年前
Ervin Teng	2b63415e	Clean up policy files	5 年前
Ervin Teng	9ad99eb6	Combined model and policy for PPO	5 年前
GitHub	329b23e0	Fix extra summary being written when loading from checkpoint (#3272 ) * Load next summary properly * Add tests for add_policy and get_policy	5 年前
Ervin Teng	6baaf980	Remove PPO model	5 年前
Ervin Teng	e912fa47	Simplify creation of optimizer, breaks multi-GPU	5 年前
Ervin Teng	164732a9	Move optimizer creation to Trainer, fix some of the reward signals	5 年前
Ervin Teng	151e3b1c	Move policy to common location, remove epsilon	5 年前
Ervin Teng	d9fe2f9c	Unified policy	5 年前
Ervin Teng	0ef40c08	SAC CC working	5 年前
Ervin Teng	db249ceb	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
Ervin Teng	28f7608f	Clean up value head creation	5 年前
Ervin Teng	edeceefd	Zeroed version of LSTM working for PPO	5 年前
Ervin Teng	649c4185	Zero out memory	5 年前
Ervin Teng	7f53bf8b	Cleanup LSTM code	5 年前
Ervin Teng	5ec49542	SAC LSTM isn't broken	5 年前
Ervin Teng	7d616651	Add burn-in for memory PPO	5 年前
Ervin Teng	4871f49c	Fix comments for PPO	5 年前
Ervin Teng	cfc2f455	Fix BC and tests	5 年前
Ervin Teng	78671383	Move initialization call around	5 年前
GitHub	dd86e879	Separate out optimizer creation and policy graph creation (#3355 )	5 年前
Ervin Teng	dcbb90e1	Fix graph init in ghost trainer	5 年前
Ervin Teng	14720e2d	Remove burn-in	5 年前
Ervin Teng	328476d8	Move check for creation into nn_policy	5 年前
Ervin Teng	ce110201	Add optional burn-in for SAC as well	5 年前
Ervin Teng	cbfbff2c	Split optimizer and TFOptimizer	5 年前
Ervin Teng	4d94e180	Move optimizer to common folder	5 年前
Ervin Teng	00017bab	Temporarily remove multi-GPU	5 年前
Ervin Teng	441e6a0c	Add typing to optimizer, rename self.tf_optimizer	5 年前
Ervin Teng	ffdc41bb	Removed floating constants	5 年前
Ervin Teng	7c0fa1c4	Remove action_holder placeholder	5 年前
Ervin Teng	be9d772e	Add option to not condition sigma on obs	5 年前
Ervin Teng	30e4424c	Fix PPO optimizer creation	5 年前
Ervin Teng	ff607162	Move learning rate reporting	5 年前
Ervin Teng	88998fc9	Add add_policy docstrings	5 年前
Ervin Teng	c735e722	Make create critic methods private	5 年前
GitHub	c145e75b	Split Policy and Optimizer, common Policy for PPO and SAC (#3345 )	5 年前
Ervin Teng	da6daebd	Make create losses private	5 年前
Andrew Cohen	5b0aca29	Merge branch 'master' into soccer-fives	5 年前
Ervin Teng	14f2a7f2	Rename LearningModel to ModelUtils	5 年前
Ervin Teng	1156b9b3	Merge branch 'develop-splitpolicyoptimizer' into develop-removeactionholder	5 年前
Ervin Teng	53c25fb1	Move one-hot out of policy and remove selected_actions	5 年前
Anupam Bhatnagar	e04fcd71	Merge branch 'master' into master-into-release-0.14.1	5 年前
GitHub	97a1d4b1	[change] Remove the action_holder placeholder from the policy. (#3492 )	5 年前
Andrew Cohen	de73baa9	Merge branch 'master' into soccer-fives	5 年前
GitHub	7d954797	[change] Separate action outputs into OutputDistributions object (#3514 )	5 年前
GitHub	e4177de0	[change] Organize trainer files a bit better (#3538 )	5 年前
GitHub	870338b4	[bug-fix] Fix issue with more than one continuous actions (#3547 )	5 年前
Andrew Cohen	573b1f6d	Merge branch 'master' into soccer-fives	5 年前
GitHub	cb153a0f	[change] Change warning language when adversarial scene is used without self-play (#3561 )	5 年前
Anupam Bhatnagar	f4dbedcf	removed extraneous logging imports and loggers	5 年前
GitHub	86141eee	Merge pull request #3560 from Unity-Technologies/new-logger Add timestamps to logs	5 年前
Anupam Bhatnagar	e8e0078e	first commit	5 年前
Anupam Bhatnagar	07b15ae7	[skip-ci] small refactors	5 年前
GitHub	e3af96ca	Merge branch 'master' into develop-demo-load-seek	5 年前
GitHub	c42a11c3	[change] Throw a proper error when sequence length is greater than batch size. (#3583 )	5 年前
GitHub	94de596b	[change] Remove concatenate in discrete action probabilities to improve inference performance (#3598 )	5 年前
Andrew Cohen	b1cfa74d	Merge branch 'master' into develop-test-imitation	5 年前
GitHub	ec278616	Hotfixes for Release 0.15.1 (#3698 ) * [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting ...	5 年前
Andrew Cohen	53bea15c	Merge branch 'master' into soccer-fives	5 年前
Andrew Cohen	ac261e36	Merge branch 'master' into self-play-mutex	5 年前
GitHub	6709a9bf	[change] Clean up trainer interface, clean up GhostTrainer stats (#3634 )	5 年前
Andrew Cohen	eefc4811	Merge branch 'master' into self-play-mutex	5 年前
Andrew Cohen	9f09a65d	team id centric ghost trainer	5 年前
GitHub	4ecd6ad3	Fix how we set logging levels (#3703 ) * cleanup logging * comments and cleanup * pylint, gym	5 年前
Andrew Cohen	59b88be6	Merge branch 'master' into self-play-mutex	5 年前
GitHub	9cbc3fa2	Asymmetric self-play (#3653 )	5 年前
Ervin Teng	06fa3d39	Merge branch 'master' into develop-sac-apex	5 年前
Anupam Bhatnagar	50e52d9c	Merge branch 'master' into distributed-training	5 年前
Andrew Cohen	3de78baa	wrapped trainer has internal policy ghost	5 年前
Andrew Cohen	3013774b	alternative to internal-policy fix	5 年前
Anupam Bhatnagar	001fce2a	first commit	5 年前
Anupam Bhatnagar	9341f7a2	[skip-ci] small refactors	5 年前
GitHub	b841c9ab	Wrapped trainer has internal policy in GhostTrainer	5 年前
Andrew Cohen	930d6fa3	Merge branch 'self-play-mutex' into soccer-2v1	5 年前
Ervin Teng	f29b17a9	Don't block one policy queue Only put policies when policy is actually updated	5 年前
Anupam Bhatnagar	eb9f3f19	[skip ci] replace buffer length by buffer size	5 年前
GitHub	aae58330	Merge branch 'master' into develop-add-inference-examples	5 年前
Anupam Bhatnagar	7ae32cc2	[skip ci] replace buffer length by buffer size	5 年前
Andrew Cohen	b0c506a6	Merge branch 'soccer-2v1' into asymm-envs	5 年前
Anupam Bhatnagar	ac80ec82	[skip ci] increment steps on training	5 年前
Anupam Bhatnagar	d49ceecc	[skip ci] moving summary writer to update_policy [skip ci] more fixes [skip ci] tweaking 3dball configs [skip ci] swap summary writer and step increment order	5 年前
Anupam Bhatnagar	95ba923d	[skip ci] fix first summary statement output	5 年前
Anupam Bhatnagar	e8d09d00	[skip ci] increment steps on training	5 年前
Ervin Teng	5e980ec1	Merge branch 'master' into develop-sac-apex	5 年前
Anupam Bhatnagar	45bac63e	[skip ci] more fixes	5 年前
Anupam Bhatnagar	86e16a64	[skip ci] tweaking 3dball configs	5 年前
Anupam Bhatnagar	2c68e921	[skip ci] fix first summary statement output	5 年前
Anupam Bhatnagar	9d7dd3b6	[skip ci] moving step increment to trainer from environment for sac	5 年前
Andrew Cohen	de0656b6	Merge branch 'internal-policy-ghost' into soccer-2v1	5 年前
Andrew Cohen	85304aff	Merge branch 'soccer-2v1' into asymm-envs	5 年前
Andrew Cohen	89db8428	Merge branch 'internal-policy-ghost-alternate' into soccer-2v1	5 年前
Andrew Cohen	26c0033c	Merge branch 'soccer-2v1' into asymm-envs	5 年前
Arthur Juliani	6879bae4	Initial optimizer port	5 年前
GitHub	4d23200b	[refactor] Run Trainers in separate threads (#3690 )	5 年前
Arthur Juliani	7c3bd376	Refactoring policy and optimizer	5 年前
Arthur Juliani	2e51260a	Resolving a few bugs	5 年前
Arthur Juliani	947f0d32	Slightly closer to running model	5 年前
Arthur Juliani	3c82bf59	Training runs, but doesn’t actually work	5 年前
Arthur Juliani	8c6f4696	Fix a couple additional bugs	5 年前
Arthur Juliani	61d671d8	Add conditional sigma for distribution	5 年前
Arthur Juliani	4a50444f	Support discrete actions as well	5 年前
Arthur Juliani	a11a79e4	Continuous and discrete now train	5 年前
Arthur Juliani	a5b5b109	Mulkti-discrete now working	5 年前
Arthur Juliani	5f936990	Visual observations now train as well	5 年前
Arthur Juliani	212e2d1d	Merge remote-tracking branch 'origin/master' into develop-add-fire	5 年前
GitHub	232519e4	[refactor] Move output artifacts to a single results/ folder (#3829 )	5 年前
Arthur Juliani	82688e5c	GRU in-progress and dynamic cnns	5 年前
Arthur Juliani	29223931	Fix for memories	5 年前
Arthur Juliani	1736559f	Combine actor and critic classes. Initial export.	5 年前
Arthur Juliani	ca887743	Support tf and pytorch alongside one another	5 年前
Arthur Juliani	9835d26c	Prepare model for onnx export	5 年前
GitHub	422247a0	update versions for patch release (#3970 ) * update versions for patch releae * Update precommit flake8 (#3961) * fix changelog	5 年前
Chris Elion	68b68396	Merge remote-tracking branch 'origin/master' into release_1_to_master	5 年前
GitHub	4641038e	Renaming max_step to interrupted in TermialStep(s) (#3908 )	5 年前
vincentpierre	c34dd5b6	Merge branch 'master' into develop-gym-wrapper	5 年前
Andrew Cohen	a2f8319a	Merge branch 'master' into asymm-envs	5 年前
Arthur Juliani	89ad3020	Merge remote-tracking branch 'origin/master' into develop-add-fire # Conflicts: # ml-agents/mlagents/trainers/policy/tf_policy.py	5 年前
Arthur Juliani	be7e55e1	Use LSTM and fix a few merge errors	5 年前
Andrew Cohen	4a3ad193	Add constant decay to beta and epsilon	5 年前
GitHub	c5b94ca6	Use LR schedule for beta and epsilon (#3940 )	5 年前
Arthur Juliani	2b3a6347	Merge remote-tracking branch 'origin/master' into develop-add-fire	5 年前
Arthur Juliani	b7be7f04	Fix bug in probs calculation	5 年前
Arthur Juliani	3eef9d78	Optimize np -> tensor operations	5 年前
Christopher Goy	ba80b292	format files with pre-commit.	4 年前
GitHub	e274bcf6	Update precommit flake8 (#3961 ) * fix flake8 errors * update flake8 hook * update flake8 plugins	5 年前
GitHub	f7373172	Merge pull request #4385 from Unity-Technologies/release_2_verified-barracuda-1.0.2 update verified brach with barracuda 1.0.2	4 年前
Ervin Teng	72180f9b	Experiment with JIT compiler	5 年前
Andrew Cohen	1e50c76e	calculating gradient norms	5 年前
vincentpierre	6ddfe74f	Merge branch 'master' into develop-gym-wrapper	5 年前
Andrew Cohen	0e965a4d	sensitivity	5 年前
Andrew Cohen	c1f91b5a	slightly nicer output	5 年前
Andrew Cohen	23b84dea	ignoring commit checks but write to csv	5 年前
Andrew Cohen	61aa9915	write to csv	5 年前
Andrew Cohen	d794964f	constant beta	5 年前
Arthur Juliani	28e095e0	Merge remote-tracking branch 'origin/master' into develop-add-fire	5 年前
Ervin Teng	f214836a	Changes for speed test	5 年前
Andrew Cohen	13c2a209	added opp, decay eps removed	5 年前
GitHub	e92b4f88	[refactor] Structure configuration files into classes (#3936 )	5 年前
Andrew Cohen	50e4585f	fixed beta	5 年前
GitHub	09853e13	[refactor] Move checkpoint saving into trainer (#4034 )	5 年前
GitHub	7229214c	[cleanup] Remove unused param keys (#4067 )	5 年前
Andrew Cohen	c0f7052b	Merge branch 'master' into develop-sampler-refactor	5 年前
Andrew Cohen	34ecc7e6	Merge branch 'master' into asymm-envs	5 年前
GitHub	a1c63c4b	Release 3 Cherry-pick bug-fixes and doc changes from master (#4102 ) * [bug-fix] Fix regression in --initialize-from feature (#4086) * Fixed text in GettingStarted page specifying the logdir for tensorboard. Before it was in a directory summaries which no longer existed. Results are now saved to the results dir. (#4085) * [refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087) * Reverting bug introduced in #4071 (#4101) Co-authored-by: Scott <Scott.m.jordan91@gmail.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	5 年前
GitHub	8a49e8e0	[refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087 )	5 年前
Anupam Bhatnagar	4afd8f92	first commit	4 年前
Andrew Cohen	21f871db	Merge branch 'develop-constant-decay' into asymm-envs	5 年前
Anupam Bhatnagar	8b6c19ae	[skip ci] adding should_still_train method to ppo	4 年前
Anupam Bhatnagar	392a84f1	[skip ci] fixing property decorator in sac	4 年前
Arthur Juliani	9724c9ac	Merge master	4 年前
Arthur Juliani	46874cc7	ONNX exporting	4 年前
GitHub	0d80d87a	Fix for discrete actions (#4181 )	4 年前
Anupam Bhatnagar	24d5f881	first commit	4 年前
GitHub	cde8bd29	Convert List[np.ndarray] to np.ndarray before using torch.as_tensor (#4183 ) Big speedup in visual obs	4 年前
GitHub	05a11c96	Develop add fire exp framework (#4213 ) * Experiment branch for comparing torch * Updates and merging ervin changes * improvements on experiment_torch.py * Better printing of results * preliminary gpu experiment * Testing gpu * Prepare to see a lot of commits, because I like my IDE and I am testing on a server and I am using git to sync the two * Prepare to see a lot of commits, because I like my IDE and I am testing on a server and I am using git to sync the two * _ * _ * _ * _ * _ * _ * _ * _ * Attempt at gpu on tf. Does not work * _ * _ * _ * _ * _ * _ * _ * _ * _ * _ * _ * Fixing learn.py	4 年前
GitHub	45154f52	Pytorch port of SAC (#4219 )	4 年前
GitHub	a28e2767	Update add-fire to latest master, including Policy refactor (#4263 ) * Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155...	4 年前
GitHub	69579611	[refactor] Refactor Actor and Critic classes (#4287 )	4 年前
Ruo-Ping Dong	6feec58a	add Saver class (only TF working)	4 年前
GitHub	93517833	[feature] Fix TF tests, add --torch CLI option, allow run TF without torch installed (#4305 )	4 年前
Andrew Cohen	f74d301a	Merge branch 'develop-add-fire' into develop-add-fire-bc	4 年前
vincentpierre	599d7e9f	Merging master	4 年前
GitHub	3a982317	[add-fire] Add learning rate and beta/epsilon decay to PyTorch (#4318 )	4 年前
GitHub	7ddfd81f	Added Reward Providers for Torch (#4280 ) * Added Reward Providers for Torch * Use NetworkBody to encode state in the reward providers * Integrating the reward prodiders with ppo and torch * work in progress, integration with PPO. Not training properly Pyramids at the moment * Integration in PPO * Removing duplicate file * Gail and Curiosity working * addressing comments * Enfore float32 for tests * enfore np.float32 in buffer	4 年前
Andrew Cohen	bf8b2328	Merge branch 'develop-add-fire' into develop-add-fire-bc	4 年前
HH	7afa1761	Merge branch 'master' into hh/develop/ragdoll-updates	5 年前
Ruo-Ping Dong	71fe4df6	fix formatting and test	4 年前
Ruo-Ping Dong	09a741c8	small improvement	4 年前
Ruo-Ping Dong	79d89158	Merge branch 'develop-add-fire' into develop-add-fire-checkpoint	4 年前
GitHub	3bcb029b	[refactor] Remove BrainParameters from Python code (#4138 )	4 年前
Ruo-Ping Dong	e06812aa	fix tests	4 年前
HH	0fdac847	Merge branch 'master' into hh/develop/crawler-ragdoll-updates	5 年前
GitHub	84440f05	Convert checkpoints to .NN (#4127 ) This change adds an export to .nn for each checkpoint generated by RLTrainer and adds a NNCheckpointManager to track the generated checkpoints and final model in training_status.json. Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>	4 年前
Arthur Juliani	6bee0fd1	Merge master	4 年前
GitHub	1f5eb9da	add pyupgrade to pre-commit and run (#4239 )	4 年前
GitHub	129f9ddc	[MLA-427] make pyupgrade convert f-strings too (#4244 ) * make pyupgrade convert f-strings too	4 年前
HH	9e6edb6c	try new reward falloff	4 年前
HH	c3c83920	cleanup	4 年前
Andrew Cohen	d8c123a0	Merge branch 'master' into sensitivity	4 年前
Andrew Cohen	02df39ab	ignore precommit	4 年前
Andrew Cohen	fa35292c	write hist to tb	4 年前
GitHub	1b098c9a	Refactor TFPolicy and Policy (#4254 ) * Refactor TFPolicy and Policy	4 年前
GitHub	380fef57	[refactor] Move TF-specific files to tf/ folder (#4266 )	4 年前
GitHub	beb5aca5	[refactor] Make classes except Optimizer framework agnostic (#4268 )	4 年前
Andrew Cohen	06e4356c	Merge branch 'master' into sensitivity	4 年前
Arthur Juliani	1a123641	Merge remote-tracking branch 'origin/master' into r5-master	4 年前
GitHub	3f44a0bc	cleanup around AdamOptimizer (#4333 ) * cleanup around AdamOptimizer * methods to creat Optimizer instances	4 年前
Andrew Cohen	598826fe	Merge branch 'develop-add-fire' into develop-add-fire-bc	4 年前
Ruo-Ping Dong	d3eb6c46	Merge branch 'develop-add-fire' into develop-add-fire-checkpoint	4 年前
Ervin Teng	eaa59cf4	Use loss masks in PPO.	4 年前
Ruo-Ping Dong	95858e25	update saver interface and add tests	4 年前
Anupam Bhatnagar	a5cc4d03	Merge branch 'master' into global-variables	4 年前
Ervin Teng	a48a0af4	Proper shape of masks	4 年前
Ruo-Ping Dong	523248be	update	4 年前
GitHub	f374f87a	[add-fire] Add LSTM to SAC, LSTM fixes and initializations (#4324 )	4 年前
Ervin Teng	1d4bc99e	Proper mask mean for PPO	4 年前
Ervin Teng	6ba23234	Fix dtype for actions	4 年前
HH	8eaddb61	Merge branch 'master' into hh/develop/loco-walker-variable-speed	4 年前
Ruo-Ping Dong	59cc1a9f	Merge branch 'develop-add-fire' into develop-add-fire-checkpoint	4 年前
Ruo-Ping Dong	409a161c	fix bc tests	4 年前
GitHub	25dc8c3d	Add Saver Class to handle all save/load/checkpoint/export work (#4323 )	4 年前
Ervin Teng	d65a9326	Merge branch 'master' into develop-add-fire-mm3	4 年前
Ruo-Ping Dong	d57aa9ab	Merge branch 'develop-add-fire-mm3' into develop-add-fire-checkpoint	4 年前
GitHub	bd6bcd2f	Merge master and add Saver class for save/load checkpoints	4 年前
Ervin Teng	f8b40b9b	Don't flatten when there are multiple continuous actions	4 年前
GitHub	6de31a03	[add-fire] Fix masked mean for 2d tensors (#4364 )	4 年前
Ervin Teng	5c1717d1	Bugfixes for continuous case	4 年前
Ervin Teng	42e25b25	Merge branch 'develop-add-fire' into develop-add-fire-memoryclass	4 年前
GitHub	8985a040	Removing the experiment script from add fire (#4373 ) * Removing the experiment script * Removing the script	4 年前
Christopher Goy	5a233353	Merge remote-tracking branch 'origin/master' into release_6-to-master	4 年前
Andrew Cohen	a65d08c7	ghost trainer tests	4 年前
GitHub	49545ce1	Pytorch ghost trainer (#4370 )	4 年前
Ervin Teng	a04e68a4	Merge branch 'develop-add-fire' into develop-add-fire-memoryclass	4 年前
HH	c72553c8	reset these to master	4 年前
Andrew Cohen	fcec6734	added comments	4 年前
GitHub	0d0d2ead	[add-fire] Revert unneeded changes back to master (#4389 )	4 年前
Ervin Teng	987ea2d0	Revert unneeded changes back to master	4 年前
Andrew Cohen	e7c9ff35	clean up docstrings create policies	4 年前
Andrew Cohen	039ae17f	capitalize Tensorflow	4 年前
GitHub	1955af9e	[feature] Add experimental PyTorch support (#4335 ) * Begin porting work * Add ResNet and distributions * Dynamically construct actor and critic * Initial optimizer port * Refactoring policy and optimizer * Resolving a few bugs * Share more code between tf and torch policies * Slightly closer to running model * Training runs, but doesn’t actually work * Fix a couple additional bugs * Add conditional sigma for distribution * Fix normalization * Support discrete actions as well * Continuous and discrete now train * Mulkti-discrete now working * Visual observations now train as well * GRU in-progress and dynamic cnns * Fix for memories * Remove unused arg * Combine actor and critic classes. Initial export. * Support tf and pytorch alongside one another * Prepare model for onnx export * Use LSTM and fix a few merge errors * Fix bug in probs calculation * Optimize np -> tensor operations * Time action sample funct...	4 年前
vincentpierre	9f51ab14	Saving the reward providers	4 年前
Ruo-Ping Dong	c47ffc20	Rename saver	4 年前
vincentpierre	108fac9a	Replace torch.detach().cpu().numpy() with a utils method	4 年前
HH	d9962254	Merge branch 'master' into hh/develop/loco-walker-variable-speed	4 年前
GitHub	ec8c24d8	add fire clean up docstrings in create policies (#4391 )	4 年前
GitHub	328353bc	Torch : Saving/Loading of the reward providers (#4405 ) * Saving the reward providers * adding tests * Moved the tests around * Update ml-agents/mlagents/trainers/tests/torch/saver/test_saver_reward_providers.py * Update ml-agents/mlagents/trainers/tests/torch/saver/test_saver_reward_providers.py * Update ml-agents/mlagents/trainers/tests/torch/saver/test_saver_reward_providers.py Co-authored-by: Ruo-Ping (Rachel) Dong <ruoping.dong@unity3d.com> * Update ml-agents/mlagents/trainers/tests/torch/saver/test_saver_reward_providers.py Co-authored-by: Ruo-Ping (Rachel) Dong <ruoping.dong@unity3d.com> Co-authored-by: Ruo-Ping (Rachel) Dong <ruoping.dong@unity3d.com>	4 年前
vincentpierre	31750e97	Using item() in place of to_numpy()	4 年前
Ruo-Ping Dong	88eff042	Merge branch 'master' into develop-saver-name	4 年前
GitHub	48f217b9	Rename Saver to ModelSaver (#4402 ) Rename Saver to ModelSaver to avoid confusion with tf.Saver	4 年前
Anupam Bhatnagar	f4f1a8d9	merge master into trainer-plugin branch	4 年前
GitHub	498934f9	Replace torch.detach().cpu().numpy() with a utils method (#4406 ) * Replace torch.detach().cpu().numpy() with a utils method * Using item() in place of to_numpy() * more use of item() and additional tests	4 年前
Ruo-Ping Dong	27fb4270	brain_name to behavior_name	4 年前
GitHub	bfda9576	Replace brain_name with behavior_name (#4419 ) brain_name -> behavior_name some prob -> log_prob in comments rename files optimizer -> optimizer_tf for tensorflow	4 年前
Ruo-Ping Dong	fd1dc3a6	Merge branch 'master' into develop-torch-omp	4 年前
Ruo-Ping Dong	f5dee9d1	jit for continuous control	4 年前
GitHub	4e93cb6e	[torch] Restructure PyTorch encoders (#4421 ) * Move linear encoding to NetworkBody * moved encoders to processors (#4420) * fix bad merge * Get it running * Replace mentions of visual_encoders * Remove output_size property * Fix tests * Fix some references * Revert test_simple_rl * Fix networks test * Make curiosity test more accomodating * Rename total_input_size * [Bug fix] Fix bug in GAIL gradient penalty (#4425) (#4426) Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com> * Up number of steps * Rename to visual_processors and vector_processors Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com> Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
GitHub	6f534366	Add torch_utils class, auto-detect CUDA availability (#4403 ) * Add torch_utils * Use torch from torch_utils * Add torch to banned modules in CI * Better import error handling * Fix flake8 errors * Address comments * Move networks to GPU if enabled * Switch to torch_utils * More flake8 problems * Move reward providers to GPU/CPU * Remove anothere set default tensor * Fix banned import in test	4 年前
Ruo-Ping Dong	fb50b0ec	add wb	4 年前
Andrew Cohen	3997b14b	Merge branch 'master' into develop-hybrid-actions	4 年前
Ervin Teng	3e771cbb	Permute visual obs outside of network	4 年前
Ervin Teng	77c810fb	Fix SAC and make utility method	4 年前
vincentpierre	181bdec0	-	4 年前
Andrew Cohen	643c8e58	ppo extended	4 年前
Andrew Cohen	44c9879e	action models	4 年前
Ervin Teng	e8431a6d	Proper dimensions for entropy, sum before bonus in PPO	4 年前
GitHub	c188781b	[life improvement] Moving Python files around (#4531 ) * Moved components to the tf folder and moved the TrainerFactory to the `trainer` folder * Addressing comments * Editing the migrating doc * fixing test	4 年前
Ervin Teng	be159ad3	Make entropy reporting same as TF	4 年前
Ervin Teng	b3e15d30	Always use separate critic	4 年前
Ervin Teng	bbf7b71d	Revert to shared	4 年前
Andrew Cohen	e5f14400	Merge branch 'master' into develop-hybrid-actions-singleton	4 年前
GitHub	a690af74	[refactor] Make PyTorch the default and TensorFlow optional (#4517 ) * Torch setup.py * Set torch to default * Make torch default in setup.py * Remove indents * Remove other instances of TF being used * Add tensorboard to setup.py * Adding correst setup commands for verifying torch is installed (#4524) * Adding correst setup commands for verifying torch is installed * Editing the test_requirments to add tf and remove torch * Develop torchdefault raise outside setup (#4530) * Torch not imported error to raise at first usage * Torch not imported error to raise at first usage * [refactor] Use PyTorch TensorBoard utils (#4518) * Convert stats writer to use PyTorch TB support * Use common function to print params * Update test * Bump tensorboard to 1.15 to fix the tests * putting tensorboard 1.15.0 as min version requirement Co-authored-by: vincentpierre <vincentpierre@unity3d.com> * [Docs] Initial documentation changes for making...	4 年前
Andrew Cohen	eaecb59e	torch utils to and from buffer	4 年前
Andrew Cohen	8013e544	ignoring Instance of 'AbstractContextManager' has no 'enter_context' member (no-member)	4 年前
GitHub	e0ef30a5	[bug-fix] Change entropy computation and loss reporting in Torch to match TF (#4538 ) * Proper dimensions for entropy, sum before bonus in PPO * Make entropy reporting same as TF * Always use separate critic * Revert to shared * Remove unneeded extra line * Change entropy shape in test * Change another entropy shape * Add entropy summing to evaluate_actions * Add notes about torch.abs(policy_loss)	4 年前
GitHub	cb8e4d25	Add ActionSpec (#4586 ) Co-authored-by: Ervin T <ervin@unity3d.com>	4 年前
Andrew Cohen	9689cf2c	remove _action_ from function names	4 年前
vincentpierre	a3a9a56b	Merge branch 'exp-multi-head-attention' into exp-bullet-hell	4 年前
Ruo-Ping Dong	9e08be87	Merge branch 'master' into release_9_branch_merge	4 年前
vincentpierre	d3d4eb90	Trainer with attention	4 年前
vincentpierre	7ef3c9a1	Trainer with attention	4 年前
GitHub	b853e5ba	Action buffer (#4612 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
GitHub	3c96a3a2	Action Model (#4580 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
GitHub	88d3ec3e	Merge master into hybrid actions staging branch (#4704 )	4 年前
GitHub	23800f33	Merge branch 'master' into develop-action-spec	4 年前
GitHub	85a7c0f7	[bug-fix] Add clipping to PyTorch policy, fix initialization (#4649 )	4 年前
Ervin Teng	184f27c6	Make buffer type-agnostic	4 年前
Ervin Teng	0548057d	Use real clipping (as in TF)	4 年前
Ervin Teng	0cdb2040	Use tanh squash	4 年前
GitHub	3ab45b3f	[bug-fix] Separate critic only for PPO (#4661 )	4 年前
GitHub	2a8c6800	[bug-fix] Add clipping to PyTorch policy, fix initialization (#4649 ) (#4662 )	4 年前
Ruo-Ping Dong	953cb6bb	Merge branch 'master' into develop-windows-delay	4 年前
Ervin Teng	2be74856	Double policy loss for no reason	4 年前
GitHub	f1206bed	Cherry-pick separate critic only for PPO (#4661 ) (#4666 )	4 年前
Ervin Teng	3b15cc32	Multiprocessing but Stats are quite broken	4 年前
Ervin Teng	3eba7423	Increase initialization	4 年前
Andrew Cohen	3f771e61	add ActionBuffers and utils	4 年前
Ervin Teng	3765c15a	Merge branch 'develop-multitype-buffer' into develop-unified-obs	4 年前
Ervin Teng	7a0ebfbd	Pretty broken	4 年前
Ervin Teng	95bdbba3	Less broken PPO	4 年前
vincentpierre	b863af57	Removing TensorFlow Trainers	4 年前
Ervin Teng	3b614302	Merge branch 'develop-multitype-buffer' into develop-centralizedcritic	4 年前
Ervin Teng	6c77ac7a	Update SAC, fix PPO batching	4 年前
vincentpierre	713e65fb	removing tensorflow testing for pytest and yamato	4 年前
Andrew Cohen	bd917c9c	action buffer passes continuous	4 年前
vincentpierre	2dd34aa5	Formatting	4 年前
Andrew Cohen	ad951493	debugging discrete	4 年前
Andrew Cohen	fcf6471e	2d discrete passes	4 年前
Ervin Teng	fdaa8c3d	Merge branch 'develop-unified-obs' into develop-centralizedcritic	4 年前
Andrew Cohen	056630d7	sac continuous and discrete train	4 年前
GitHub	990f801a	Develop hybrid action staging (#4702 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com> Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com>	4 年前
vincentpierre	735fcd52	[WIP] Refactor trainers to use list of obs rather than vec and vis obs	4 年前
Ervin Teng	6846af21	Multi-input network	4 年前
vincentpierre	93ca1409	fixing the tests	4 年前
vincentpierre	7a5cc9ec	Merge master into develop-rm-tf	4 年前
Ervin Teng	56dcd75a	Get next critic observations into value estimate	4 年前
vincentpierre	c1587bce	Solving merge conflicts	4 年前
Andrew Cohen	8172b3d6	test_simple_rl/reward providers pass tf/torch	4 年前
Arthur Juliani	0d2f8887	Merge remote-tracking branch 'origin/master' into goal-conditioning # Conflicts: # ml-agents-envs/mlagents_envs/base_env.py # ml-agents-envs/mlagents_envs/rpc_utils.py # ml-agents/mlagents/trainers/tests/mock_brain.py # ml-agents/mlagents/trainers/tests/simple_test_envs.py	4 年前
Andrew Cohen	73b778cc	rename extract to from_dict	4 年前
GitHub	cc6b4564	Multi Directional Walker and Initial Hypernetwork (#4740 )	4 年前
Ervin Teng	25dfd883	Merge branch 'master' into develop-centralizedcritic	4 年前
Andrew Cohen	cd73cce2	test_trajectory fixed	4 年前
GitHub	22658a40	use sensor types to differentiate obs (#4749 )	4 年前
GitHub	903d3afe	Merge pull request #4707 from Unity-Technologies/develop-rm-tf Removing TensorFlow Trainers	4 年前
Andrew Cohen	498b1ee6	Merge branch 'develop-action-buffer' into develop-hybrid-actions-singleton	4 年前
GitHub	d2d46103	Remove print from ppo tf opti Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
Andrew Cohen	68b98915	Merge branch 'develop-action-buffer' of https://github.com/Unity-Technologies/ml-agents into develop-action-buffer	4 年前
GitHub	29d94c7c	Merge pull request #4734 from Unity-Technologies/develop-obs-as-list Refactor trainers to use list of obs rather than vec and vis obs	4 年前
Andrew Cohen	1d234d1d	bc works	4 年前
Andrew Cohen	c0d01baf	Merge branch 'master' into merge-release11-master	4 年前
Andrew Cohen	e81e68de	comms agent and fixed hallway	4 年前
vincentpierre	44ed3258	Merging master	4 年前
Andrew Cohen	ca5a5194	soccer comms on the cloud	4 年前
Andrew Cohen	3457cd3c	save only discrete actions as prev	4 年前
Andrew Cohen	12828bdc	remove tau from diff for	4 年前
Andrew Cohen	55e928cf	fix 16 envs soccer	4 年前
vincentpierre	449712b0	renaming sensor_spec to sensor_specS	4 年前
Andrew Cohen	35769b53	Merge branch 'develop-action-buffer' into develop-hybrid-actions-singleton	4 年前
Andrew Cohen	c843e3d4	hallway collab exps on cloud	4 年前
Andrew Cohen	a20287f7	continuous comms	4 年前
Andrew Cohen	14ea0ad2	comment out comms in ppo optimizer	4 年前
Andrew Cohen	17496265	move AgentAction, ActionLogProbs, and ActionFlattener to separate files	4 年前
Chris Elion	76ebc20c	Merge remote-tracking branch 'origin/master' into r12-to-master	4 年前
Andrew Cohen	f57875e0	layer norm	4 年前
GitHub	458fee17	Merge pull request #4763 from Unity-Technologies/develop-att WIP Made initial changes to enable dimension properties and added attention module	4 年前
Andrew Cohen	bc77c990	layer norm and weight decay with fixed architecture	4 年前
Ervin Teng	330fc1d0	Merge branch 'master' into develop-centralizedcritic-mm	4 年前
vincentpierre	519c5f47	merging master	4 年前
Ruo-Ping Dong	8ed14762	Merge branch 'develop-hybrid-actions-singleton' into develop-hybrid-actions-csharp	4 年前
Andrew Cohen	96c01a63	custom layer norm	4 年前
GitHub	cc948a41	Policy output actiontuple (#4651 )	4 年前
GitHub	14129a08	[MLA-470] Barracuda + TF cleanup (#4837 ) * remove barracuda conversion, tensorflow cleanup * unused var	4 年前
Andrew Cohen	1bc2ff96	add weight decay to trainers	4 年前
Arthur Juliani	0b4b0992	Rename more files	4 年前
Ervin Teng	aba633b2	Merge branch 'develop-attention-refactor' into develop-centralizedcritic-mm	4 年前
Ruo-Ping Dong	a7d04be6	Merge branch 'develop-hybrid-actions-singleton' into develop-hybrid-actions-csharp	4 年前
Ruo-Ping Dong	180d3e20	Merge branch 'develop-centralizedcritic-mm' into develop-cc-teammanager	4 年前
HH	0024a286	merge ervin's new stuff	4 年前
Ervin Teng	9c3da1b6	New buffer layout, TeamObsUtil, pad dead agents	4 年前
GitHub	67ad9651	Merge pull request #4825 from Unity-Technologies/sensor-types [WIP] Observation Types	4 年前
vincentpierre	8660b1c2	merging master	4 年前
Ervin Teng	3daa17a9	Merge branch 'develop-centralizedcritic-mm' into develop-zombieteammanager	4 年前
Ervin Teng	6b8b3db3	Try subtract marginalized value	4 年前
Ervin Teng	2203fc0e	Bootstrap if teammates not done	4 年前
Ervin Teng	092ea232	Some more progress - still broken	4 年前
Ervin Teng	457b2630	I think it's running	4 年前
Ervin Teng	3e481f7d	Fix issue with team_actions	4 年前
brccabral	457fb612	Merge branch 'master' of https://github.com/Unity-Technologies/ml-agents	4 年前
Ervin Teng	0919a32d	Add next action and next team obs	4 年前
Andrew Cohen	07e92563	Merge branch 'develop-centralizedcritic-counterfact' into develop-coma2	4 年前
Andrew Cohen	6e1826f8	might be right	4 年前
vincentpierre	52b011d6	_	4 年前
vincentpierre	5f9ea5ea	_	4 年前
vincentpierre	6f3ea7b8	_	4 年前
Andrew Cohen	feb38012	add lambda return and target network	4 年前
Andrew Cohen	5741f8f6	no target net	4 年前
Andrew Cohen	79c658d2	remove normalize advantages	4 年前
Andrew Cohen	a92baab6	add target network back	4 年前
Andrew Cohen	a4c336c2	value estimator	4 年前
vincentpierre	115e944b	adding weight decay for experimentation	4 年前
Andrew Cohen	d1285626	add target net	4 年前
Andrew Cohen	bd341f7f	no target, increase lambda	4 年前
Andrew Cohen	bdd73403	remove prints	4 年前
Andrew Cohen	8a5d291f	use v return	4 年前
Andrew Cohen	6b2a6c5f	use target net	4 年前
Andrew Cohen	fce842aa	adding zombie to coma2 brnch	4 年前
Andrew Cohen	7f491ae7	cloud run with coma2 of held out zombie test env	4 年前
Andrew Cohen	9af22d30	use only value funcs	4 年前
Andrew Cohen	a3453c5d	target of baseline is returns_v	4 年前
Andrew Cohen	511a9a7e	no baseline	4 年前
Andrew Cohen	e3239529	remove target update	4 年前
Andrew Cohen	95253b47	ntegrate teammate dones	4 年前
Andrew Cohen	2c3147b9	add value clipping	4 年前
Andrew Cohen	687f411b	try again on cloud	4 年前
Andrew Cohen	b0bf7817	clipping values and updated zombie	4 年前
Andrew Cohen	b5271926	remove value head clipping	4 年前
Ervin Teng	a4eaebcb	Add trust region to COMA updates	4 年前
Ervin Teng	bca6c92c	Add clipping, use same network for value	4 年前
Ervin Teng	3283b6a1	Remove Q-net for perf	4 年前
Ervin Teng	3aefac39	Use GAE again	4 年前
GitHub	64fc7f43	Buffer key enums (#4907 )	4 年前
Andrew Cohen	b08318f9	add clipping	4 年前
Ervin Teng	adad5183	Weight decay, regularizaton loss	4 年前
Ervin Teng	4fe8d036	Try reduce bias	4 年前
Andrew Cohen	39592650	remove clipping	4 年前
Ervin Teng	2be83146	Use same network	4 年前
Ervin Teng	6094613d	try reduce bias more	4 年前
Andrew Cohen	74885bab	add local reward to plot	4 年前
Ervin Teng	ac4dc336	Remove reg loss, still stable	4 年前
Andrew Cohen	c08fefbc	reduce initialization weights	4 年前
Ervin Teng	64b34759	Black format	4 年前
Ervin Teng	1cf27871	Merge branch 'develop-coma2-samenet' into develop-coma2-samenet-sum	4 年前
Ervin Teng	b6f88d6d	Merge branch 'develop-base-teammanager' into develop-agentprocessor-teammanager	4 年前
Andrew Cohen	6bd396ee	add critic to optimizer, ppo runs	4 年前
Andrew Cohen	3aec18a1	fix precommit errors	4 年前
Andrew Cohen	8efdeeb0	make critic a property	4 年前
Ervin Teng	0bde7598	Back out trainer changes	4 年前
Andrew Cohen	c74dca9f	add SharedActorCritic	4 年前
Ruo-Ping Dong	c87bce9e	Merge branch 'master' into develop-base-teammanager	4 年前
Ervin Teng	a9116382	Bug fixes	4 年前
Andrew Cohen	98d647de	MultiInputNetBody	4 年前
Ervin Teng	ae7643b8	Proper critic memories for PPO	4 年前
vincentpierre	e1b94b8b	Merge branch 'master' into develop-var-len-obs-feature	4 年前
Chris Elion	e4f51ca7	Merge remote-tracking branch 'origin/master' into MLA-1734-demo-provider	4 年前
Ervin Teng	d4438878	Merge branch 'develop-base-teammanager' into develop-agentprocessor-teammanager	4 年前
Ervin Teng	fd3f05b9	Enable GAIL to decay	4 年前
Ervin Teng	97842f81	Fix non-lstm PPO	4 年前
Ervin Teng	e46a86ad	Merge branch 'master' into develop-superpush-int	4 年前
HH	15d512f9	Merge branch 'master' into hh/develop/dodgeball	4 年前
Ervin Teng	9bc88c41	Running COMA (not sure if learning)	4 年前
Ervin Teng	2f209c12	Buffer fixes (cherry picked from commit 2c03d2b544d0c615e7b60d939f01532674d80753)	4 年前
GitHub	338af2ec	Move the Critic into the Optimizer (#4939 ) Co-authored-by: Ervin Teng <ervin@unity3d.com>	4 年前
HH	4c947151	Merge branch 'main' into hh/develop/dodgeball	4 年前
Andrew Cohen	4b58527c	checkout ppo/optimizer from main	4 年前
Ervin Teng	61781a1a	Merge branch 'main' into develop-agentprocessor-teammanager	4 年前
Andrew Cohen	9060da06	Merge branch 'develop-agentprocessor-teammanager' into develop-coma2-trainer	4 年前
Arthur Juliani	06c147f8	Merge remote-tracking branch 'origin/main' into goal-conditioning-new # Conflicts: # Project/Assets/ML-Agents/Examples/Crawler/Prefabs/CrawlerBase.prefab # Project/Assets/ML-Agents/Examples/GridWorld/Prefabs/Area.prefab # Project/Assets/ML-Agents/Examples/GridWorld/Scenes/GridWorld.unity # Project/ProjectSettings/TagManager.asset # com.unity.ml-agents/Runtime/Sensors/CameraSensor.cs # com.unity.ml-agents/Runtime/Sensors/VectorSensor.cs # ml-agents/mlagents/trainers/torch/networks.py # ml-agents/mlagents/trainers/torch/utils.py	4 年前
GitHub	d36a5242	Python Dataflow for Group Manager (#4926 ) * Make buffer type-agnostic * Edit types of Apped method * Change comment * Collaborative walljump * Make collab env harder * Add group ID * Add collab obs to trajectory * Fix bug; add critic_obs to buffer * Set group ids for some envs * Pretty broken * Less broken PPO * Update SAC, fix PPO batching * Fix SAC interrupted condition and typing * Fix SAC interrupted again * Remove erroneous file * Fix multiple obs * Update curiosity reward provider * Update GAIL and BC * Multi-input network * Some minor tweaks but still broken * Get next critic observations into value estimate * Temporarily disable exporting * Use Vince's ONNX export code * Cleanup * Add walljump collab YAML * Lower max height * Update prefab * Update prefab * Collaborative Hallway * Set num teammates to 2 * Add config and group ids to HallwayCollab * Fix bug with hallway collab * E...	4 年前
Ervin Teng	c8137dcd	Merge branch 'main' into develop-superpush-int	4 年前
GitHub	f16ce486	Update v2-staging from main (March 15) (#5123 )	4 年前
GitHub	47db8ce1	[bug-fix] Fix padding for List entries in buffer (#5046 ) * Fix padding for List entries in buffer * Revert to coonverting to np.array * Fix dtype in PPO trainer	4 年前
Christopher Goy	921ba4f0	Update v2-staging from main (March 15) (#5123 )	4 年前
Christopher Goy	ebe45056	Merge branch 'main' into release_14_branch-to-main	4 年前
Ervin Teng	8902c058	Merge branch 'main' into develop-coma2-trainer	4 年前
GitHub	fc5d0a3f	[bug-fix] Fix save/restore critic, add test (#5062 ) * Fix save/restore critic, add test * Rename module for PPO * Use correct policy in test	4 年前
Chris Elion	970f1d40	Merge remote-tracking branch 'origin/v2-staging' into MLA-1634-ObservationSpec	4 年前
Ervin Teng	1f026c70	Merge branch 'main' into develop-superpush-branch-cleanup	4 年前
Ervin Teng	ce872033	Revert "Merge branch 'main' into develop-superpush-branch-cleanup" This reverts commit 5bea802525381f931a5e0f8b8778fe27a12f03af, reversing changes made to cee3524e85161e13689d95f66bc6bff994d2cdfd.	4 年前
GitHub	8f35bdd3	POCA trainer (#5005 ) Co-authored-by: Ervin Teng <ervin@unity3d.com> Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
Andrew Cohen	9e77d7e1	Merge branch 'main' into develop-soccer-groupman	4 年前
GitHub	62314056	Fix ghost curriculum and make steps private (#5098 ) * use get step to determine curriculum * add to CHANGELOG * Make step in trainer private (#5099) Co-authored-by: Ervin T <ervin@unity3d.com>	4 年前
Ervin Teng	54ffbed6	[cherry-pick] Fix ghost curriculum and make steps private (#5098 ) * use get step to determine curriculum * add to CHANGELOG * Make step in trainer private (#5099) Co-authored-by: Ervin T <ervin@unity3d.com>	4 年前
Andrew Cohen	9176247c	Merge branch 'main' into develop-soccer-groupman-mod	4 年前
GitHub	e81e038b	Fix end episode for POCA, add warning for group reward if not POCA (#5113 ) * Fix end episode for POCA, add warning for group reward if not POCA * Add missing imports	4 年前
GitHub	63169e2c	[cherry-pick] Fix group rewards for POCA, add warning for non-POCA trainers (#5120 ) * Fix end episode for POCA, add warning for group reward if not POCA (#5113) * Fix end episode for POCA, add warning for group reward if not POCA * Add missing imports * Use np.any, which is faster	4 年前
Ervin Teng	d1c24251	[bug-fix] When agent isn't training, don't clear update buffer (#5205 ) * Don't clear update buffer, but don't append to it either * Update changelog * Address comments * Make experience replay buffer saving more verbose (cherry picked from commit 63e7ad44d96b7663b91f005ca1d88f4f3b11dd2a)	4 年前
Andrew Cohen	18be47e8	Merge branch 'main' into develop-soccer-groupman-mod	4 年前
Ervin Teng	a9ca7b3b	Do burn-in for PPO	4 年前
GitHub	ff21216d	[bug-fix] When agent isn't training, don't clear update buffer (#5205 ) * Don't clear update buffer, but don't append to it either * Update changelog * Address comments * Make experience replay buffer saving more verbose	4 年前
vincentpierre	5d384292	forgot one	4 年前

1 2 3 4 5 ...

708 次代码提交 (88ad67f0-a15b-4c36-8c1c-c931c353d728)