ml-agents

作者	SHA1	备注	提交日期
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
GitHub	9c50abcf	GAIL and Pretraining (#2118 ) Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes: - A new GAILRewardSignal and GAILModel for GAIL/VAIL - A BCModule component (not a reward signal) to do pretraining during RL - Documentation for both of these - Change to Demo Loader that lets you load multiple demo files in a folder - Example Demo files for all of our tested sample environments (for future regression testing)	5 年前
GitHub	78c0c202	fix mock_brain (#2377 ) fix mock_brain	5 年前
GitHub	b498c19d	Fix BCTrainer increment_steps (#2384 )	5 年前
GitHub	d7ebaae1	Return list instead of np array for make_mini_batch() (#2371 ) Return list instead of np array for make_mini_batch() to reduce time copying data	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
Ervin Teng	072d2ef8	Merge latest develop	5 年前
GitHub	689765d6	Modification of reward signals and rl_trainer for SAC (#2433 ) * Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo. * Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals. * Moves end_episode to rl_trainer * Fixed bug with BCModule with RNN	5 年前
GitHub	b73fa378	Add more extensive tests for BC trainer (#2506 ) * Add more extensive tests for BC trainer * Break up tests for BC trainer	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
Ervin Teng	e826f4bb	Bugfix for LSTM+BC (#2679 ) * Fix LSTM+BC in discrete case * Add test for Barracuda export * Fix LSTM training for BC	5 年前
GitHub	68965c7b	Use a class for camera res, not dict (#2656 )	5 年前
GitHub	24ba9d58	Develop deprecate broadcasting (#2669 ) * Feature Deprecation : Online Behavioral Cloning In this PR : - Delete the online_bc_trainer - Delete the tests for online bc - delete the configuration file for online bc training * Deleting the BCTeacherHelper.cs Script TODO : - Remove usages in the scene - Documentation Edits DO NOT MERGE * IMPORTANT : REMOVED ALL IL SCENES - Removed all the IL scenes from the Examples folder * Removed all mentions of online BC training in the Documentation * Made a note in the Migrating.md doc about the removal of the Online BC feature. * Modified the Academy UI to remove the control checkbox and replaced it with a train in the editor checkbox * Removed the Broadcast functionality from the non-Learning brains * Bug fix * Note that the scenes are broken since the BroadcastHub has changed * Modified the LL-API for Python to remove the broadcasting functiuonality. * All unit tests are running * Modifie...	5 年前
GitHub	e6240c7a	Bugfix for LSTM+BC (#2679 ) * Fix LSTM+BC in discrete case * Add test for Barracuda export * Fix LSTM training for BC	5 年前
GitHub	619465e1	Fix crash when SAC is used with Curiosity and Continuous Actions (#2740 ) * Add test for curiosity + SAC * Use actions for all curiosity (need to test on PPO) * Fix issue with reward signals updating multiple times * Put curiosity actions in the right placeholder * Test PPO curiosity update	5 年前
GitHub	0892ef2c	[WIP] ISensor interface and use for visual observations (#2731 ) * ISensor and SensorBase * camera and rendertex first pass * use isensors for visual obs * Update gridworld with CameraSensors * compressed obs for reals * Remove AgentInfo.visualObservations * better separation of train and inference sensor calls * compressed obs proto - need CI to generate code * int32 * get proto name right * run protoc locally for new fiels * apply generated proto patch (pyi files were weird) * don't repeat bytes * hook up compressedobs * dont send BrainParameters until there's an AgentInfo * python BrainParameters now needs an AgentInfo to create * remove last (I hope) dependency on camerares * remove CameraResolutions and AgentInfo.visual_observations * update mypy-protobuf version * cleanup todos * python cleanup * more unit test fixes * more unit test fix * camera sensors for VisualFood collector, record demo * SensorCompon...	5 年前
GitHub	ccb7eab4	Remove {text,custom} {action,observations} (#2839 ) * delete text actions and obs * delete custom actions and obs * regenerate protos * cleanup C# * format * fix tests * fix base env signature * doc cleanup	5 年前
GitHub	e6f549dc	[MLA-12] update protobuf for vector observations (#2862 )	5 年前
GitHub	69d1a033	Develop remove past action communication (#2913 ) * Modifying the .proto files * attempt 1 at refactoring Python * works for ppo hallway * changing the documentation * now works with both sac and ppo both training and inference * Ned to fix the tests * TODOs : - Fix the demonstration recorder - Fix the demonstration loader - verify the intrinsic reward signals work - Fix the tests on Python - Fix the C# tests * Regenerating the protos * fix proto typo * protos and modifying the C# demo recorder * modified the demo loader * Demos are loading * IMPORTANT : THESE ARE THE FILES USED FOR CONVERSION FROM OLD TO NEW FORMAT * Modified all the demo files * Fixing all the tests * fixing ci * addressing comments * removing reference to memories in the ll-api	5 年前
Ervin Teng	29cdf77a	Fix RL tests	5 年前
Ervin Teng	3a4fa244	Switch to tanh squash in PPO	5 年前
Ervin Teng	fd0647a6	Rename append_update_buffer to append_to_update_buffer	5 年前
GitHub	652488d9	check for numpy float64 (#2948 )	5 年前
GitHub	213cd68d	Split Buffer into processing and update buffers (#2964 ) This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps: - Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR) - Move the processing buffer and add/process experiences into a separate, outside class - Change the data type of the update buffer to be a Trajectory - Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers	5 年前
Ervin Teng	eb4a04a5	Merge branch 'master' into develop-tanhsquash	5 年前
GitHub	3b4b0d55	Remove random normal epsilon (#3039 )	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
Ervin Teng	336ca456	Kill the ProcessingBuffer	5 年前
Ervin Teng	27c2a55b	Lots of test fixes	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
GitHub	7fbf6b1d	add flake8-bugbear (#3137 ) * unused loop variables * change loop variable	5 年前
GitHub	29c91b14	update flake8 plugin version and fix warnings (#3180 )	5 年前
GitHub	f058b18c	Replace BrainInfos with BatchedStepResult (#3207 )	5 年前
Ervin Teng	48b39b80	Fix ghost trainer and all tests	5 年前
Ervin Teng	7b0f700b	Add test for deletion calls	5 年前
Ervin Teng	5ef902bf	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
GitHub	870338b4	[bug-fix] Fix issue with more than one continuous actions (#3547 )	5 年前
GitHub	43f23ee3	WIP : Changes to the LL-API - Refactor of “done” logic (#3681 ) * [skip ci] WIP : Modify the base_env.py file * [skip ci] typo * [skip ci] renamed some methods * [skip ci] Incorporated changes from our meeting * [skip ci] everything is broken * [skip ci] everything is broken * [skip ci] formatting * Fixing the gym tests * Fixing bug, C# has an error that needs fixing * Fixing the test * relaxing the threshold of 0.99 to 0.9 * fixing the C# side * formating * Fixed the llapi integratio test * [Increasing steps for testing] * Fixing the python tests * Need __contains__ after all * changing the max_steps in the tests * addressing comments * Making env_manager logic clearer as proposed in the comments * Remove duplicated logic and added back in episode length (#3728) * removing mentions of multi-agent in gym and changed the docstring in base_env.py * Edited the Documentation for the changes to the LLAPI (#3733) * Edite...	5 年前
GitHub	4641038e	Renaming max_step to interrupted in TermialStep(s) (#3908 )	5 年前
GitHub	a28e2767	Update add-fire to latest master, including Policy refactor (#4263 ) * Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155...	4 年前
Andrew Cohen	4b094d25	large normalization obs unit test	4 年前
Andrew Cohen	8013e544	ignoring Instance of 'AbstractContextManager' has no 'enter_context' member (no-member)	4 年前
GitHub	cb8e4d25	Add ActionSpec (#4586 ) Co-authored-by: Ervin T <ervin@unity3d.com>	4 年前
Andrew Cohen	9689cf2c	remove _action_ from function names	4 年前
Andrew Cohen	590adc01	make_fake_trajectory/step take ActionSpec arg	4 年前
GitHub	b853e5ba	Action buffer (#4612 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
GitHub	3c96a3a2	Action Model (#4580 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
Andrew Cohen	0e28dd8f	add static method to create continuous/discrete	4 年前
GitHub	88d3ec3e	Merge master into hybrid actions staging branch (#4704 )	4 年前
Andrew Cohen	b6d10456	removed action_spec.size	4 年前
Arthur Juliani	b8f22fd7	Update second half of tests	4 年前
Andrew Cohen	8172b3d6	test_simple_rl/reward providers pass tf/torch	4 年前
Arthur Juliani	0d2f8887	Merge remote-tracking branch 'origin/master' into goal-conditioning # Conflicts: # ml-agents-envs/mlagents_envs/base_env.py # ml-agents-envs/mlagents_envs/rpc_utils.py # ml-agents/mlagents/trainers/tests/mock_brain.py # ml-agents/mlagents/trainers/tests/simple_test_envs.py	4 年前
GitHub	a0d1c829	Action Docs part2 (#4739 ) * reduce usage of "vector action" and "action space" * more cleanup * undo GettingStarted change for now * batch size description * Apply suggestions from code review Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com> Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com>	4 年前
Andrew Cohen	cd73cce2	test_trajectory fixed	4 年前
vincentpierre	8cb050ef	WIP Made initial changes to enale dimension properties and added attention module	4 年前
vincentpierre	719c969c	addressing comments. ObservationSpec is no longer a list	4 年前
vincentpierre	4bba4e8e	Renaming ObservationSpec to SensorSpec	4 年前
vincentpierre	c5a057d2	renaming obs_spec variables	4 年前
Andrew Cohen	3457cd3c	save only discrete actions as prev	4 年前
vincentpierre	449712b0	renaming sensor_spec to sensor_specS	4 年前
Andrew Cohen	886883b3	Merge branch 'develop-hybrid-action-staging' into develop-hybrid-actions-singleton	4 年前
Arthur Juliani	0b4b0992	Rename more files	4 年前
Arthur Juliani	7c37c759	Fix some mis-renamings	4 年前
Arthur Juliani	0a876b9c	Fix typos	4 年前
Arthur Juliani	e3de0406	Plurals	4 年前
GitHub	64fc7f43	Buffer key enums (#4907 )	4 年前
Ruo-Ping Dong	471a2e82	fix tests	4 年前
Ruo-Ping Dong	c87bce9e	Merge branch 'master' into develop-base-teammanager	4 年前
Ervin Teng	e112ede0	Fix mock brain	4 年前
Ervin Teng	44073593	Test for team obs in agentprocessor	4 年前
Ervin Teng	a81512c9	Test for group and add team reward	4 年前
Ervin Teng	d4438878	Merge branch 'develop-base-teammanager' into develop-agentprocessor-teammanager	4 年前
Andrew Cohen	5d517c5e	clean ups	4 年前
GitHub	d36a5242	Python Dataflow for Group Manager (#4926 ) * Make buffer type-agnostic * Edit types of Apped method * Change comment * Collaborative walljump * Make collab env harder * Add group ID * Add collab obs to trajectory * Fix bug; add critic_obs to buffer * Set group ids for some envs * Pretty broken * Less broken PPO * Update SAC, fix PPO batching * Fix SAC interrupted condition and typing * Fix SAC interrupted again * Remove erroneous file * Fix multiple obs * Update curiosity reward provider * Update GAIL and BC * Multi-input network * Some minor tweaks but still broken * Get next critic observations into value estimate * Temporarily disable exporting * Use Vince's ONNX export code * Cleanup * Add walljump collab YAML * Lower max height * Update prefab * Update prefab * Collaborative Hallway * Set num teammates to 2 * Add config and group ids to HallwayCollab * Fix bug with hallway collab * E...	4 年前
Ervin Teng	fd0dd35c	Merge branch 'main' into develop-coma2-trainer	4 年前
GitHub	f16ce486	Update v2-staging from main (March 15) (#5123 )	4 年前

1 2

77 次代码提交 (bc1fdf07-41d4-4db9-b0d2-b6cf2261a5da)