ml-agents

作者	SHA1	备注	提交日期
Arthur Juliani	de700c3a	Multi Brain Training and Recurrent state encoder (#166 ) * `learn.py` is now main script for training brains. * Simultaneous multi-brain training is now possible. * `ghost-trainer` allows for proper training in adversarial scenarios. * `imitation-trainer` provides a basic implementation of real-time behavioral cloning. * All trainer hyperparameters now exist in `.yaml` files. * `PPO.ipynb` removed. * LSTM model added. * More dynamic buffer class to handle greater variety of scenarios.	7 年前
GitHub	51621334	State Stacking & Banan Environment (#262 ) * Add support for stacking past n states to allow network to learn temporal dependencies. * Add Banana Collector environment for demonstrating partially observable multi-agent environments. * Add 3DBall Hard which lacks velocity information in state representation. Used as test for LSTM and state-stacking features. * Rework Tennis environment to be continuous control and trainable in 100k steps.	7 年前
vincentpierre	b7f787f6	bug fix on range of observations	7 年前
Arthur Juliani	7bf0c888	trainer will raise an error if the memory of the brain is set wrong (#273 )	7 年前
GitHub	f8a8b112	Move epsilon generation into graph (#283 )	7 年前
GitHub	36d58cee	Add Seeding, MaxStepReached, and Bootstrapping fix (#303 ) * Add ability to seed learning (numpy, tensorflow, and Unity) with `--seed` flag. * Add `maxStepReached` flag to Agents and Academy. * Change way value bootstrapping works in PPO to take advantage of timeouts. * Default size of GridWorld changed to 5x5 in order to validate bootstrapping changes.	7 年前
GitHub	e676017b	Reorganize learn.py (#302 ) Split learn.py into learn.py as command-line wrapper, and trainer_controller.py as core trainer/env logic.	7 年前
GitHub	8317a659	Behavioral Cloning & Trainers Reorg (#328 ) * Implement behavioral cloning for cc/dc, fc/rnn, state/observations. * Re-organize folder structure in anticipation of unitytrainers as a package. * Create demo environment BananaImitation to validate behavioral cloning. * Fixes #336	7 年前
GitHub	e11dae1d	Python Testing & Image Inference Improvements (#353 ) * Reorganized python tests into separate folder, and make individiual test files for different (sub) modules. * Add tests for trainer_controller, PPO, and behavioral cloning. More to come soon. * Minor bug fixes discovered while writing tests. * Reworked GirdWorld to reset much faster. * Cleaned ObservationToTex and reworked GetObservationMatrixList to be 3x faster.	7 年前
eshvk	030ac5c5	[cleanup] Add a new type hint to call a dictionary of BrainInfo objects as an AllBrainInfo. Propagate this hint to all methods. Some pep8 cleanups.	7 年前
Arthur Juliani	c3644f56	Buffer fix for properly masking gradients	7 年前
GitHub	f8d27dc5	Merge branch 'development-0.3' into feature/LSTM2	7 年前
GitHub	99103b29	Use `curr_brain_info`	7 年前
GitHub	f134016b	On Demand Decision (#308 ) * On Demand Decision : Use RequestDecision and RequestAction * New Agent Inspector : Use it to set On Demand Decision * New BrainParameters interface * LSTM memory size is now set in python * New C# API * Semantic Changes * Replaced RunMDP * New Bouncer Environment to test On Demand Dscision	7 年前
GitHub	dcf58f75	Feature/previous text action (#375 ) * [Previous Text Actions] Renamed previous_action to previous_vector_action added previous_text_action to the BrainInfo * [Semantics] Carried the modifications to the semantics of previous_vector_action to the trainers	7 年前
GitHub	a7c9096f	[Semantics] Modified the placeholder names (#381 )	7 年前
GitHub	5bdef358	[Fix] Must take mean of entropy to avoid errors what number of agents change during training (#407 )	7 年前
GitHub	848b8a58	Fix PPO regression (#434 ) * Fix PPO regression	7 年前
vincentpierre	e5a59e9b	[Refactor] renamed is_continuous to is_continuous_action and added is_continuous_observation to decrease confusion	7 年前
eshvk	2d2eb64b	[containers] Enables container support for scenes that use visual observations	7 年前
GitHub	e43c069e	Merge pull request #547 from Unity-Technologies/develop-feature-docker-improvements [containers] Enables container support for scenes that use visual obsvervations	7 年前
GitHub	237b41f9	Hotfix 0.3.0c (#618 ) Fixes the following issues: * Missing component reference in BananaRL environment. * Neural Network for multiple visual observations was not properly generated. * Episode time-out value estimate bootstrapping used incorrect observation as input.	7 年前
GitHub	1a449e98	Hotfix 0.3.1b (#637 ) * [Fix] Use the stored agent info instead of the previous agent info when bootstraping the value * [Bug Fix] Addressed #643 * [Added Line Break]	7 年前
vincentpierre	076c8744	Report means instead of totals for losses (#580 ) * Report means instead of totals for losses. * Report absolute loss for policy.	7 年前
GitHub	b2675216	Hotfix 0.3.1b (#656 ) * [Fix] Use the stored agent info instead of the previous agent info when bootstraping the value * [Bug Fix] Addressed #643 * [Added Line Break]	7 年前
GitHub	755be43e	[Cold Fix] Making the episode length and mean reward more accurate for the first episode (#657 )	7 年前
GitHub	3b866e9f	Use Clipped Gaussian (#649 ) This PR makes the following changes: * Moves clipping of continuous control model into model itself. Output is now always [-1, 1]. * Internal model values are now clipped between [-3, 3] before being rescaled to [-1, 1] for output. * This improves training performance by providing a wider range of values within which the pdf of the gaussian can fall. Output of [-1, 1] is used to be more environment-creator friendly. * Fixes issue where epsilon was erroneously being used to reconstruct old probabilities during PPO update, leading to reduced learning performance. * Introduce ScaleAction() function within python to easily rescale values from [-1, 1] to arbitrary range. * Re-train all CC models using improved algorithm. All performance levels are equal or improved. In the case of Crawler, improvement is drastic. * Update documentation appropriately. * Made miscellaneous minor code style and optimization improvements within environments.	7 年前
Arthur Juliani	9477eaa9	Develop fix cumulative reward (#725 ) * [Cold Fix] Split the way cummulative rewards and episode length are counted The reward is appended at each step to the cummulative reward The episode count is ONLY incremented when d_t+1 is false	7 年前
GitHub	702d98c6	[Fix] The summary writer is now implemented in the abtract trainer class. (#806 ) Summary writer now displays {}: Step: {}. No episode was completed since last summary. when there was no completed episodes	7 年前
GitHub	c17937ef	Curiosity Driven Exploration & Pyramids Environments (#739 ) * Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer. * To enable, set use_curiosity flag to true in hyperparameter file. * Includes refactor of unitytrainers model code to accommodate new feature. * Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.	7 年前
vincentpierre	a22c0f65	[fixing encoding_size]	7 年前
Arthur Juliani	d7338050	Enable concurrent sessions	7 年前
Arthur Juliani	5d402be9	Minor Optimizations (#836 )	7 年前
GitHub	8526dcfc	Fix for visual observations (#847 )	7 年前
GitHub	0f65e272	[Addresses #842 ] (#849 ) In the case the agent is done imediately after spawning, its stats are empty because the stats need at least 2 successive experieces to create the stats. By specifying the default value of 0, the error does no longer appear	7 年前
GitHub	47fc38ab	Additional Tests & Bug Fixes (#854 ) * Add tests and fix for sparse tensor warning * Rename mock communicator parameter * Test longer sequences * Curiosity tests and bug fixes	7 年前
GitHub	6e6e8d96	Fix for CC models w/ RNN and Curiosity (#860 )	7 年前
vincentpierre	4c6439d5	[Attempted fix]	7 年前
GitHub	6df07946	Fix for Discrete observations + Curiosity (#866 )	7 年前
GitHub	68d6170f	Error message when using ODD and Curiosity (#883 ) * Remove extra bouncer brain hyperparameters * Add error when using curiosity+odd	7 年前
Arthur Juliani	5e48766d	Remove discrete observations	7 年前
Arthur Juliani	195ac934	Merge branch 'develop' into develop-runs # Conflicts: # python/learn.py # python/unitytrainers/trainer.py	7 年前
vincentpierre	e47cec56	[Initial Commit]	7 年前
unityjeffrey	0d67f311	changed ml agents to ml-agents	7 年前
unityjeffrey	19fb437a	changed to Unity ML-Agents Toolkit (english)	7 年前
Arthur Juliani	9701c3db	Merge branch 'hotfix-0' into release-v0.4-fix-curiosity-odd # Conflicts: # python/unitytrainers/ppo/trainer.py	7 年前
Arthur Juliani	0c6411c2	Use switch between old and new behavior	7 年前
Arthur Juliani	1bfbf67a	Simplify approach	7 年前
Arthur Juliani	cfb7cfef	Code clean-up	7 年前
Arthur Juliani	083cbff5	Add to docstring	7 年前
Arthur Juliani	c31f63b5	Fix typo	7 年前
GitHub	e50ac7ae	Merge branch 'develop' into hotfix-0	7 年前
Deric Pang	8380f2f2	Moved curriculum code out of environment code.	6 年前
Arthur Juliani	1eb701af	Merge remote-tracking branch 'origin/develop' into develop-value-estimates-ppo	6 年前
Arthur Juliani	f52d5a92	Merge remote-tracking branch 'origin/develop' into develop-runs	6 年前
GitHub	ef3025e6	Merge pull request #1004 from Unity-Technologies/develop-runs Enable multiple runs in learn.py	6 年前
GitHub	7d0990cf	Fix MultiBrain bug that was introduced with the value estimates (#1018 )	6 年前
Arthur Juliani	52865022	[Fix bug 1040] (#1062 )	6 年前
Arthur Juliani	3659bbcd	Develop multi discrete (#1022 ) Replace discrete control with multi-discrete control.	6 年前
Arthur Juliani	fee02a84	Attempted fix for #1059 (#1089 )	6 年前
Deric Pang	634280a6	Fixed imports, all tests are passing.	6 年前
Arthur Juliani	17224292	Fix for Curiosity with ODD (#1107 ) This branch addresses the issue referenced in #1059	6 年前
GitHub	ded0d8c7	Develop action masking (#1080 ) * [Initial Commit] Modified the model.py file and the ppo/trainer.py file to use masked actions * Preliminary modifications to the python side of the code to enable action masking * Preliminary modifications to the C# side of the code to enable action masking * Preliminary modifications to the communication side of the code to enable action masking * Implemented action masking for BC Note : The actions of the teacher are not masked * More error messages for the action masking * fix pytests * Added Documentation * Address comment * Addressed Comments on docs * Addressed second comment on docs * Addressed comments for the python side of the code * Created the action masker and associated unit tests * Addressed comments on the C# side * Addressed the comment regarding action_masking_name * Addressed the comments	6 年前
Deric Pang	e55b1764	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
Deric Pang	e0e02ae6	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
Deric Pang	cdb41480	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	29084e77	Curriculum learning reward thresholding bug fix (#1141 )	6 年前
GitHub	d2c320dd	Remove graph scope (#1205 ) * initial commit : Only works with PPO balance ball * Fix for recurrent * [Fix indentation error] * Fixed BC * Remove Dead code * Addressing comment : Removing dead code * Fixing the Pytest * edited comments * Removing GraphScope from the InternalBrain (#1227) * Documentation changes for removing graph scope (#1226) * Documentation changes * removed the keep checkpoint printing	6 年前
GitHub	3c9603d6	Demonstration Recorder (#1240 )	6 年前
GitHub	840417ff	Use organized tags for tensorboard stats (#1248 )	6 年前
GitHub	6c354d16	New Learning Brain (#1303 ) * Initial Commit * attempt at refactor * Put all static methods into the CoreInternalBrain * improvements * more testing * modifications * renamed epsilon * misc * Now supports discrete actions * added discrete support and RNN and visual. Left to do is refactor and save variables into models * code cleaning * made a tensor generator and applier * fix on the models.py file * Moved the Checks to a different Class * Added some unit tests * BugFix * Need to generate the output tensors as well as inputs before executing the graph * Made NodeNames static and created a new namespace * Added comments to the TensorAppliers * Started adding comments on the TensorGenerators code * Added comments for the Tensor Generator * Moving the helper classes into a separate folder * Added initial comments to the TensorChecks * Renamed NodeNames -> TensorNames * Removing warnings in tests * Now using Aut...	6 年前
GitHub	b6c97cb6	Fix for divide-by-zero error with Discrete Actions (#1520 ) * Enable buffer padding to be set other than 0 Allows buffer padding in AgentBufferField to be set to a custom value. In particular, 0-padding for `action_masks` causes a divide-by-zero error, and should be padded with 1’s instead. This is done as a parameter passed to the `append` method, so that the pad value can be set right after the instantiation of an AgentBufferField.	6 年前
GitHub	c258b1c3	Move 'take_action' into Policy class (#1669 ) * Move 'take_action' into Policy class This refactor is part of Actor-Trainer separation. Since policies will be distributed across actors in separate processes which share a single trainer, taking an action should be the responsibility of the policy. This change makes a few smaller changes: * Combines `take_action` logic between trainers, making it more generic * Adds an `ActionInfo` data class to be more explicit about the data returned by the policy, only used by TrainerController and policy for now. * Moves trainer stats logic out of `take_action` and into `add_experiences` * Renames 'take_action' to 'get_action'	6 年前
eshvk	cc9bdf17	Added logging per Brain of time to update policy, time elapsed during training, time to collect experiences, buffer length, average return	6 年前
eshvk	fb04c40c	Reorganize to make metrics collection more accurate	6 年前
GitHub	93760bc4	Adds SubprocessUnityEnvironment for parallel envs (#1751 ) This commit adds support for running Unity environments in parallel. An abstract base class was created for UnityEnvironment which a new SubprocessUnityEnvironment inherits from. SubprocessUnityEnvironment communicates through a pipe in order to send commands which will be run in parallel to its workers. A few significant changes needed to be made as a side-effect: * UnityEnvironments are created via a factory method (a closure) rather than being directly created by the main process. * In mlagents-learn "worker-id" has been replaced by "base-port" and "num-envs", and worker_ids are automatically assigned across runs. * BrainInfo objects now convert all fields to numpy arrays or lists to avoid serialization issues.	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
Vincent(Yuan) Gao	a15763f8	Clear cumulative_returns_since_policy_update (#2120 ) Before the CSV file's mean rewards would lag much behind the rest of the code since this buffer was never cleared.	6 年前
GitHub	a4d5b2d3	Doc/comment cleanup - Fix some occurrences of 'the the' (#2119 )	6 年前
GitHub	2671e1a0	Enable mypy in precommit checks (#2177 ) * WIP precommit on top level * update CI * circleci fixes * intentionally fail black * use --show-diff-on-failure in CI * fix command order * rebreak a file * apply black * WIP enable mypy * run mypy on each package * fix trainer_metrics mypy errors * more mypy errors * more mypy * Fix some partially typed functions * types for take_action_outputs * fix formatting * cleanup * generate stubs for proto objects * fix ml-agents-env mypy errors * disallow-incomplete-defs for gym-unity * Add CI notes to CONTRIBUTING.md	6 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
Jonathan Harper	177ee5b8	Remove unused "last reward" logic, TF nodes At each step, an unused `last_reward` variable in the TF graph is updated in our PPO trainer. There are also related unused methods in various places in the codebase. This change removes them.	5 年前
GitHub	b05c9ac1	Add environment manager for parallel environments (#2209 ) Previously in v0.8 we added parallel environments via the SubprocessUnityEnvironment, which exposed the same abstraction as UnityEnvironment while actually wrapping many parallel environments via subprocesses. Wrapping many environments with the same interface as a single environment had some downsides, however: * Ordering needed to be preserved for agents across different envs, complicating the SubprocessEnvironment logic * Asynchronous environments with steps taken out of sync with the trainer aren't viable with the Environment abstraction This PR introduces a new EnvManager abstraction which exposes a reduced subset of the UnityEnvironment abstraction and a SubprocessEnvManager implementation which replaces the SubprocessUnityEnvironment.	5 年前
Chris Elion	bb7773c1	add flake8 to precommit	5 年前
GitHub	9c50abcf	GAIL and Pretraining (#2118 ) Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes: - A new GAILRewardSignal and GAILModel for GAIL/VAIL - A BCModule component (not a reward signal) to do pretraining during RL - Documentation for both of these - Change to Demo Loader that lets you load multiple demo files in a folder - Example Demo files for all of our tested sample environments (for future regression testing)	5 年前
GitHub	1c18bd18	Swap 0 set and reward buffer append (#2273 ) Fix bug with reward_buffer always 0	5 年前
GitHub	a5b7cf95	Fix get_value_estimate and buffer append (#2276 ) Fixes shuffling issue with newer versions of numpy (#1798). * make get_value_estimates output a dict of floats * Use np.append instead of convert to list, unconvert * Add type hints and test for get_value_estimates	5 年前
Chris Elion	5d07ca1f	Merge remote-tracking branch 'origin/develop' into enable-flake8	5 年前
GitHub	be4292fb	Add different types of visual encoder (nature cnn/resnet) Add resnet and nature cnn in addition to default visual encoder	5 年前
GitHub	9eb3f049	Cleanup unused code in TrainerController (#2315 ) * Removes unused SubprocessEnvManager import in trainer_controller * Removes unused `steps` argument to `TrainerController._save_model` * Consolidates unnecessary branching for curricula in `TrainerController.advance` * Moves `reward_buffer` into `TFPolicy` from `PPOPolicy` and adds `BCTrainer` support so that we don't have a broken interface / undefined behavior when BCTrainer is used with curricula.	5 年前
GitHub	6225317d	refactor vis_encoder_type and add to doc refactor vis_encoder_type and add to doc	5 年前
GitHub	a9fe719c	Add Multi-GPU implementation for PPO (#2288 ) Add MultiGpuPPOPolicy class and command line options to run multi-GPU training	5 年前
GitHub	d7ebaae1	Return list instead of np array for make_mini_batch() (#2371 ) Return list instead of np array for make_mini_batch() to reduce time copying data	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
GitHub	bd7eb286	Update reward signals in parallel with policy (#2362 )	5 年前
GitHub	689765d6	Modification of reward signals and rl_trainer for SAC (#2433 ) * Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo. * Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals. * Moves end_episode to rl_trainer * Fixed bug with BCModule with RNN	5 年前
GitHub	43696d60	Fix bug in add_rewards_output and add test (#2442 )	5 年前
GitHub	832e4a47	Normalize observations when adding experiences (#2556 ) * Normalize observations when adding experiences This change moves normalization of vector observations into the trainer's "add_experiences" interface. Prior to this change, normalization occurred at inference time. This was somewhat confusing since usually executing a forward pass shouldn't have side-effects which would change the training step. Also, in a asynchronous or distributed setting where we copy the neural network weights from a trainer to a remote actor / inference worker we'd end up with training issues because of the weights being different on the trainer than the workers.	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
Ervin Teng	094cbe4d	Fix bug when batch size is a non-multiple of sequence length (#2661 )	5 年前
GitHub	5d3e05d1	Fix "memory leak" during inference (#2722 ) * Clear buffer if not training * Add tests	5 年前
GitHub	4da157fe	more pylint fixes (#2842 )	5 年前
Ervin Teng	748c250e	Somewhat running	5 年前
Andrew Cohen	13fe9cf8	Bubbled up indexing of AllBrainInfo to trainer controller from trainers	5 年前
Andrew Cohen	e96b80db	recieves brain_name and identifier on python side	5 年前
Ervin Teng	df5ee7bf	Split buffer into two buffers (PPO works)	5 年前
Ervin Teng	3a4fa244	Switch to tanh squash in PPO	5 年前
Ervin Teng	fd0647a6	Rename append_update_buffer to append_to_update_buffer	5 年前
GitHub	652488d9	check for numpy float64 (#2948 )	5 年前
GitHub	213cd68d	Split Buffer into processing and update buffers (#2964 ) This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps: - Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR) - Move the processing buffer and add/process experiences into a separate, outside class - Change the data type of the update buffer to be a Trajectory - Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers	5 年前
Ervin Teng	2c9376bc	Convert to trajectory	5 年前
Ervin Teng	9e661f0c	Looks like it's training	5 年前
Ervin Teng	a97ffb47	Attempt reward reporting	5 年前
Ervin Teng	9c5fdd31	Stats reporting is working	5 年前
Ervin Teng	eb4a04a5	Merge branch 'master' into develop-tanhsquash	5 年前
GitHub	3b4b0d55	Remove random normal epsilon (#3039 )	5 年前
Ervin Teng	e0e57188	Clean up some stuff	5 年前
Andrew Cohen	5097bcc0	recieves brain_name and identifier on python side	5 年前
Ervin Teng	f94365a2	No longer using ProcessingBuffer for PPO	5 年前
Ervin Teng	8b3b9e6c	Move trajectory and related functions to trajectory.py	5 年前
Ervin Teng	76abf968	Add back max_step logic	5 年前
Andrew Cohen	8578b0b7	add_policy and create_policy separated	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
Ervin Teng	c9116ed2	Move some common logic to buffer class	5 年前
GitHub	90db165f	Add --namespace-packages to mypy for mlagents (#3075 )	5 年前
Andrew Cohen	614d276f	recieves brain_name and identifier on python side	5 年前
Andrew Cohen	96922f84	recieves brain_name and identifier on python side	5 年前
Ervin Teng	27c2a55b	Lots of test fixes	5 年前
Ervin Teng	97d66e71	Remove BootstrapExperience	5 年前
Ervin Teng	324d217b	Move agent_id to Trajectory	5 年前
Ervin Teng	77ff4822	Add back next_obs	5 年前
Andrew Cohen	d1edbf43	add_policy and create_policy separated	5 年前
Ervin Teng	2b811fc8	Properly report value estimates and episode length	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
Andrew Cohen	de902fbb	passes all pytest and C# tests	5 年前
GitHub	2ac242f7	Remove TrainerMetrics and add CSVWriter using new StatsWriter API (#3108 )	5 年前
Ervin Teng	fdf9aea7	Make conversion methods part of NamedTuples	5 年前
Ervin Teng	6242b67d	Add way to check if trajectory is done or max_reached	5 年前
GitHub	0b5b1b01	Develop magic string + trajectory (#3122 ) * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * rebased with develop * Correctly calls concatBehaviorIdentifiers * added team id and identifier concat to behavior parameters * splitting brain params into brain name and identifiers * set team id in prefab * recieves brain_name and identifier on python side * rebased with develop * Correctly calls concatBehaviorIdentifiers * trainer_controller expects name_behavior_ids * add_policy and create_policy separated * adjusting tests to expect trainer.add_policy to be called * fixing tests * fixed naming ...	5 年前
GitHub	c7da0139	Fix mypy errors in trainer code. (#3135 )	5 年前
Andrew Cohen	082789ea	Merge branch 'master' into develop-magic-string	5 年前
Andrew Cohen	6a4e7cf9	added ppo/sac_policy attributes to keep up with master	5 年前
Ervin Teng	1bd791e5	Merge branch 'master' into develop-agentprocessor	5 年前
Andrew Cohen	3e76adbd	fixing more ci tests	5 年前
Ervin Teng	e577d5ea	Fix some mypy issues and remove unused code	5 年前
Andrew Cohen	c3a92afa	fixing ci ppo_policy	5 年前
Ervin Teng	9e0ef912	Fixed value estimate bug	5 年前
GitHub	bec2e8f0	Add Trajectory/Policy Queues, move Trainer logic to advance() (#3113 )	5 年前
Ervin Teng	db743971	Move private methods out of trainer, simplify interface	5 年前
Ervin Teng	b3a4e641	Remove some vestigial code	5 年前
Ervin Teng	48793ec1	Fix test	5 年前
GitHub	5bc7531b	Get step from policy (#3223 )	5 年前
Ervin Teng	cd74e51b	More progress	5 年前
Ervin Teng	76ad64d7	Some more bugfixes	5 年前
Ervin Teng	29f3330f	Merge master into hotfix-0.13.1	5 年前
GitHub	329b23e0	Fix extra summary being written when loading from checkpoint (#3272 ) * Load next summary properly * Add tests for add_policy and get_policy	5 年前
Ervin Teng	164732a9	Move optimizer creation to Trainer, fix some of the reward signals	5 年前
Ervin Teng	151e3b1c	Move policy to common location, remove epsilon	5 年前
Ervin Teng	db249ceb	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
Ervin Teng	edeceefd	Zeroed version of LSTM working for PPO	5 年前
Ervin Teng	cfc2f455	Fix BC and tests	5 年前
Ervin Teng	78671383	Move initialization call around	5 年前
GitHub	dd86e879	Separate out optimizer creation and policy graph creation (#3355 )	5 年前
Ervin Teng	00017bab	Temporarily remove multi-GPU	5 年前
Ervin Teng	be9d772e	Add option to not condition sigma on obs	5 年前
Ervin Teng	88998fc9	Add add_policy docstrings	5 年前
GitHub	e4177de0	[change] Organize trainer files a bit better (#3538 )	5 年前
GitHub	cb153a0f	[change] Change warning language when adversarial scene is used without self-play (#3561 )	5 年前
GitHub	c42a11c3	[change] Throw a proper error when sequence length is greater than batch size. (#3583 )	5 年前
GitHub	ec278616	Hotfixes for Release 0.15.1 (#3698 ) * [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting ...	5 年前
GitHub	6709a9bf	[change] Clean up trainer interface, clean up GhostTrainer stats (#3634 )	5 年前
Andrew Cohen	9f09a65d	team id centric ghost trainer	5 年前
GitHub	4ecd6ad3	Fix how we set logging levels (#3703 ) * cleanup logging * comments and cleanup * pylint, gym	5 年前
Andrew Cohen	59b88be6	Merge branch 'master' into self-play-mutex	5 年前
Andrew Cohen	3de78baa	wrapped trainer has internal policy ghost	5 年前
Andrew Cohen	3013774b	alternative to internal-policy fix	5 年前
Ervin Teng	f29b17a9	Don't block one policy queue Only put policies when policy is actually updated	5 年前
Anupam Bhatnagar	eb9f3f19	[skip ci] replace buffer length by buffer size	5 年前
Anupam Bhatnagar	ac80ec82	[skip ci] increment steps on training	5 年前
Anupam Bhatnagar	d49ceecc	[skip ci] moving summary writer to update_policy [skip ci] more fixes [skip ci] tweaking 3dball configs [skip ci] swap summary writer and step increment order	5 年前
Anupam Bhatnagar	95ba923d	[skip ci] fix first summary statement output	5 年前
Ervin Teng	5e980ec1	Merge branch 'master' into develop-sac-apex	5 年前
Anupam Bhatnagar	45bac63e	[skip ci] more fixes	5 年前
Anupam Bhatnagar	9d7dd3b6	[skip ci] moving step increment to trainer from environment for sac	5 年前
Arthur Juliani	7c3bd376	Refactoring policy and optimizer	5 年前
Arthur Juliani	3c82bf59	Training runs, but doesn’t actually work	5 年前
Arthur Juliani	8c6f4696	Fix a couple additional bugs	5 年前
Arthur Juliani	61d671d8	Add conditional sigma for distribution	5 年前
Arthur Juliani	212e2d1d	Merge remote-tracking branch 'origin/master' into develop-add-fire	5 年前
GitHub	232519e4	[refactor] Move output artifacts to a single results/ folder (#3829 )	5 年前
Arthur Juliani	ca887743	Support tf and pytorch alongside one another	5 年前
GitHub	422247a0	update versions for patch release (#3970 ) * update versions for patch releae * Update precommit flake8 (#3961) * fix changelog	5 年前
GitHub	4641038e	Renaming max_step to interrupted in TermialStep(s) (#3908 )	5 年前
Arthur Juliani	89ad3020	Merge remote-tracking branch 'origin/master' into develop-add-fire # Conflicts: # ml-agents/mlagents/trainers/policy/tf_policy.py	5 年前
Christopher Goy	ba80b292	format files with pre-commit.	4 年前
GitHub	e274bcf6	Update precommit flake8 (#3961 ) * fix flake8 errors * update flake8 hook * update flake8 plugins	5 年前
Andrew Cohen	0e965a4d	sensitivity	5 年前
Andrew Cohen	23b84dea	ignoring commit checks but write to csv	5 年前
Andrew Cohen	61aa9915	write to csv	5 年前
Arthur Juliani	28e095e0	Merge remote-tracking branch 'origin/master' into develop-add-fire	5 年前
Ervin Teng	f214836a	Changes for speed test	5 年前
GitHub	e92b4f88	[refactor] Structure configuration files into classes (#3936 )	5 年前
GitHub	09853e13	[refactor] Move checkpoint saving into trainer (#4034 )	5 年前
GitHub	7229214c	[cleanup] Remove unused param keys (#4067 )	5 年前
GitHub	a1c63c4b	Release 3 Cherry-pick bug-fixes and doc changes from master (#4102 ) * [bug-fix] Fix regression in --initialize-from feature (#4086) * Fixed text in GettingStarted page specifying the logdir for tensorboard. Before it was in a directory summaries which no longer existed. Results are now saved to the results dir. (#4085) * [refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087) * Reverting bug introduced in #4071 (#4101) Co-authored-by: Scott <Scott.m.jordan91@gmail.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	5 年前
Anupam Bhatnagar	4afd8f92	first commit	5 年前
Anupam Bhatnagar	8b6c19ae	[skip ci] adding should_still_train method to ppo	5 年前
Arthur Juliani	9724c9ac	Merge master	5 年前
Arthur Juliani	46874cc7	ONNX exporting	5 年前
GitHub	05a11c96	Develop add fire exp framework (#4213 ) * Experiment branch for comparing torch * Updates and merging ervin changes * improvements on experiment_torch.py * Better printing of results * preliminary gpu experiment * Testing gpu * Prepare to see a lot of commits, because I like my IDE and I am testing on a server and I am using git to sync the two * Prepare to see a lot of commits, because I like my IDE and I am testing on a server and I am using git to sync the two * _ * _ * _ * _ * _ * _ * _ * _ * Attempt at gpu on tf. Does not work * _ * _ * _ * _ * _ * _ * _ * _ * _ * _ * _ * Fixing learn.py	4 年前
GitHub	45154f52	Pytorch port of SAC (#4219 )	4 年前
GitHub	a28e2767	Update add-fire to latest master, including Policy refactor (#4263 ) * Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155...	4 年前
GitHub	69579611	[refactor] Refactor Actor and Critic classes (#4287 )	4 年前
Ruo-Ping Dong	6feec58a	add Saver class (only TF working)	4 年前
GitHub	93517833	[feature] Fix TF tests, add --torch CLI option, allow run TF without torch installed (#4305 )	4 年前
GitHub	7ddfd81f	Added Reward Providers for Torch (#4280 ) * Added Reward Providers for Torch * Use NetworkBody to encode state in the reward providers * Integrating the reward prodiders with ppo and torch * work in progress, integration with PPO. Not training properly Pyramids at the moment * Integration in PPO * Removing duplicate file * Gail and Curiosity working * addressing comments * Enfore float32 for tests * enfore np.float32 in buffer	4 年前
Ruo-Ping Dong	71fe4df6	fix formatting and test	4 年前
Ruo-Ping Dong	09a741c8	small improvement	4 年前
Ruo-Ping Dong	79d89158	Merge branch 'develop-add-fire' into develop-add-fire-checkpoint	4 年前
GitHub	3bcb029b	[refactor] Remove BrainParameters from Python code (#4138 )	4 年前
Ruo-Ping Dong	e06812aa	fix tests	4 年前
GitHub	84440f05	Convert checkpoints to .NN (#4127 ) This change adds an export to .nn for each checkpoint generated by RLTrainer and adds a NNCheckpointManager to track the generated checkpoints and final model in training_status.json. Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>	4 年前
GitHub	1f5eb9da	add pyupgrade to pre-commit and run (#4239 )	4 年前
GitHub	129f9ddc	[MLA-427] make pyupgrade convert f-strings too (#4244 ) * make pyupgrade convert f-strings too	4 年前
HH	9e6edb6c	try new reward falloff	4 年前
HH	c3c83920	cleanup	4 年前
Andrew Cohen	d8c123a0	Merge branch 'master' into sensitivity	4 年前
Andrew Cohen	02df39ab	ignore precommit	4 年前
Andrew Cohen	fa35292c	write hist to tb	4 年前
GitHub	1b098c9a	Refactor TFPolicy and Policy (#4254 ) * Refactor TFPolicy and Policy	4 年前
GitHub	beb5aca5	[refactor] Make classes except Optimizer framework agnostic (#4268 )	4 年前
Andrew Cohen	06e4356c	Merge branch 'master' into sensitivity	4 年前
GitHub	3f44a0bc	cleanup around AdamOptimizer (#4333 ) * cleanup around AdamOptimizer * methods to creat Optimizer instances	4 年前
Ruo-Ping Dong	d3eb6c46	Merge branch 'develop-add-fire' into develop-add-fire-checkpoint	4 年前
Ruo-Ping Dong	95858e25	update saver interface and add tests	4 年前
Ruo-Ping Dong	523248be	update	4 年前
HH	8eaddb61	Merge branch 'master' into hh/develop/loco-walker-variable-speed	4 年前
Ruo-Ping Dong	409a161c	fix bc tests	4 年前
GitHub	25dc8c3d	Add Saver Class to handle all save/load/checkpoint/export work (#4323 )	4 年前
Ervin Teng	d65a9326	Merge branch 'master' into develop-add-fire-mm3	4 年前
Ruo-Ping Dong	d57aa9ab	Merge branch 'develop-add-fire-mm3' into develop-add-fire-checkpoint	4 年前
GitHub	8985a040	Removing the experiment script from add fire (#4373 ) * Removing the experiment script * Removing the script	4 年前
Andrew Cohen	a65d08c7	ghost trainer tests	4 年前
GitHub	49545ce1	Pytorch ghost trainer (#4370 )	4 年前
Andrew Cohen	fcec6734	added comments	4 年前
GitHub	0d0d2ead	[add-fire] Revert unneeded changes back to master (#4389 )	4 年前
Andrew Cohen	e7c9ff35	clean up docstrings create policies	4 年前
Andrew Cohen	039ae17f	capitalize Tensorflow	4 年前
GitHub	1955af9e	[feature] Add experimental PyTorch support (#4335 ) * Begin porting work * Add ResNet and distributions * Dynamically construct actor and critic * Initial optimizer port * Refactoring policy and optimizer * Resolving a few bugs * Share more code between tf and torch policies * Slightly closer to running model * Training runs, but doesn’t actually work * Fix a couple additional bugs * Add conditional sigma for distribution * Fix normalization * Support discrete actions as well * Continuous and discrete now train * Mulkti-discrete now working * Visual observations now train as well * GRU in-progress and dynamic cnns * Fix for memories * Remove unused arg * Combine actor and critic classes. Initial export. * Support tf and pytorch alongside one another * Prepare model for onnx export * Use LSTM and fix a few merge errors * Fix bug in probs calculation * Optimize np -> tensor operations * Time action sample funct...	4 年前
Ruo-Ping Dong	c47ffc20	Rename saver	4 年前
Ruo-Ping Dong	27fb4270	brain_name to behavior_name	4 年前
Ruo-Ping Dong	f5dee9d1	jit for continuous control	4 年前
GitHub	6f534366	Add torch_utils class, auto-detect CUDA availability (#4403 ) * Add torch_utils * Use torch from torch_utils * Add torch to banned modules in CI * Better import error handling * Fix flake8 errors * Address comments * Move networks to GPU if enabled * Switch to torch_utils * More flake8 problems * Move reward providers to GPU/CPU * Remove anothere set default tensor * Fix banned import in test	4 年前
Andrew Cohen	643c8e58	ppo extended	4 年前
GitHub	c188781b	[life improvement] Moving Python files around (#4531 ) * Moved components to the tf folder and moved the TrainerFactory to the `trainer` folder * Addressing comments * Editing the migrating doc * fixing test	4 年前
Ervin Teng	b3e15d30	Always use separate critic	4 年前
Andrew Cohen	e5f14400	Merge branch 'master' into develop-hybrid-actions-singleton	4 年前
GitHub	a690af74	[refactor] Make PyTorch the default and TensorFlow optional (#4517 ) * Torch setup.py * Set torch to default * Make torch default in setup.py * Remove indents * Remove other instances of TF being used * Add tensorboard to setup.py * Adding correst setup commands for verifying torch is installed (#4524) * Adding correst setup commands for verifying torch is installed * Editing the test_requirments to add tf and remove torch * Develop torchdefault raise outside setup (#4530) * Torch not imported error to raise at first usage * Torch not imported error to raise at first usage * [refactor] Use PyTorch TensorBoard utils (#4518) * Convert stats writer to use PyTorch TB support * Use common function to print params * Update test * Bump tensorboard to 1.15 to fix the tests * putting tensorboard 1.15.0 as min version requirement Co-authored-by: vincentpierre <vincentpierre@unity3d.com> * [Docs] Initial documentation changes for making...	4 年前
Andrew Cohen	8013e544	ignoring Instance of 'AbstractContextManager' has no 'enter_context' member (no-member)	4 年前
GitHub	cb8e4d25	Add ActionSpec (#4586 ) Co-authored-by: Ervin T <ervin@unity3d.com>	4 年前
Andrew Cohen	9689cf2c	remove _action_ from function names	4 年前
GitHub	3c96a3a2	Action Model (#4580 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
GitHub	88d3ec3e	Merge master into hybrid actions staging branch (#4704 )	4 年前
Ervin Teng	184f27c6	Make buffer type-agnostic	4 年前
Ervin Teng	0cdb2040	Use tanh squash	4 年前
Ervin Teng	3b15cc32	Multiprocessing but Stats are quite broken	4 年前
Ervin Teng	3765c15a	Merge branch 'develop-multitype-buffer' into develop-unified-obs	4 年前
Ervin Teng	7a0ebfbd	Pretty broken	4 年前
Ervin Teng	95bdbba3	Less broken PPO	4 年前
vincentpierre	b863af57	Removing TensorFlow Trainers	4 年前
vincentpierre	713e65fb	removing tensorflow testing for pytest and yamato	4 年前
vincentpierre	2dd34aa5	Formatting	4 年前
vincentpierre	735fcd52	[WIP] Refactor trainers to use list of obs rather than vec and vis obs	4 年前
vincentpierre	93ca1409	fixing the tests	4 年前
Ervin Teng	56dcd75a	Get next critic observations into value estimate	4 年前
GitHub	cc6b4564	Multi Directional Walker and Initial Hypernetwork (#4740 )	4 年前
GitHub	22658a40	use sensor types to differentiate obs (#4749 )	4 年前
Ervin Teng	330fc1d0	Merge branch 'master' into develop-centralizedcritic-mm	4 年前
Ervin Teng	6b8b3db3	Try subtract marginalized value	4 年前
Ervin Teng	2203fc0e	Bootstrap if teammates not done	4 年前
Ervin Teng	092ea232	Some more progress - still broken	4 年前
Ervin Teng	457b2630	I think it's running	4 年前
Andrew Cohen	6e1826f8	might be right	4 年前
vincentpierre	52b011d6	_	4 年前
Andrew Cohen	feb38012	add lambda return and target network	4 年前
Andrew Cohen	79c658d2	remove normalize advantages	4 年前
Andrew Cohen	a4c336c2	value estimator	4 年前
Andrew Cohen	bd341f7f	no target, increase lambda	4 年前
Andrew Cohen	bdd73403	remove prints	4 年前
Andrew Cohen	8a5d291f	use v return	4 年前
Andrew Cohen	fce842aa	adding zombie to coma2 brnch	4 年前
Andrew Cohen	7f491ae7	cloud run with coma2 of held out zombie test env	4 年前
Andrew Cohen	9af22d30	use only value funcs	4 年前
Andrew Cohen	a3453c5d	target of baseline is returns_v	4 年前
Andrew Cohen	511a9a7e	no baseline	4 年前
Andrew Cohen	95253b47	ntegrate teammate dones	4 年前
Andrew Cohen	687f411b	try again on cloud	4 年前
Ervin Teng	3aefac39	Use GAE again	4 年前
GitHub	64fc7f43	Buffer key enums (#4907 )	4 年前
Ervin Teng	adad5183	Weight decay, regularizaton loss	4 年前
Ervin Teng	4fe8d036	Try reduce bias	4 年前
Ervin Teng	6094613d	try reduce bias more	4 年前
Andrew Cohen	74885bab	add local reward to plot	4 年前
Andrew Cohen	c08fefbc	reduce initialization weights	4 年前
Ervin Teng	a9116382	Bug fixes	4 年前
Andrew Cohen	98d647de	MultiInputNetBody	4 年前
Ervin Teng	ae7643b8	Proper critic memories for PPO	4 年前
Ervin Teng	97842f81	Fix non-lstm PPO	4 年前
Ervin Teng	e46a86ad	Merge branch 'master' into develop-superpush-int	4 年前
Ervin Teng	9bc88c41	Running COMA (not sure if learning)	4 年前
Ervin Teng	2f209c12	Buffer fixes (cherry picked from commit 2c03d2b544d0c615e7b60d939f01532674d80753)	4 年前
Ervin Teng	61781a1a	Merge branch 'main' into develop-agentprocessor-teammanager	4 年前
GitHub	f16ce486	Update v2-staging from main (March 15) (#5123 )	4 年前
GitHub	47db8ce1	[bug-fix] Fix padding for List entries in buffer (#5046 ) * Fix padding for List entries in buffer * Revert to coonverting to np.array * Fix dtype in PPO trainer	4 年前
GitHub	62314056	Fix ghost curriculum and make steps private (#5098 ) * use get step to determine curriculum * add to CHANGELOG * Make step in trainer private (#5099) Co-authored-by: Ervin T <ervin@unity3d.com>	4 年前
Ervin Teng	d1c24251	[bug-fix] When agent isn't training, don't clear update buffer (#5205 ) * Don't clear update buffer, but don't append to it either * Update changelog * Address comments * Make experience replay buffer saving more verbose (cherry picked from commit 63e7ad44d96b7663b91f005ca1d88f4f3b11dd2a)	4 年前

... 3 4 5 6 7

318 次代码提交 (ecef019c-02cb-4416-b3fd-ae57e8ca074f)