ml-agents

作者	SHA1	备注	提交日期
Arthur Juliani	de700c3a	Multi Brain Training and Recurrent state encoder (#166 ) * `learn.py` is now main script for training brains. * Simultaneous multi-brain training is now possible. * `ghost-trainer` allows for proper training in adversarial scenarios. * `imitation-trainer` provides a basic implementation of real-time behavioral cloning. * All trainer hyperparameters now exist in `.yaml` files. * `PPO.ipynb` removed. * LSTM model added. * More dynamic buffer class to handle greater variety of scenarios.	7 年前
GitHub	51621334	State Stacking & Banan Environment (#262 ) * Add support for stacking past n states to allow network to learn temporal dependencies. * Add Banana Collector environment for demonstrating partially observable multi-agent environments. * Add 3DBall Hard which lacks velocity information in state representation. Used as test for LSTM and state-stacking features. * Rework Tennis environment to be continuous control and trainable in 100k steps.	7 年前
GitHub	36d58cee	Add Seeding, MaxStepReached, and Bootstrapping fix (#303 ) * Add ability to seed learning (numpy, tensorflow, and Unity) with `--seed` flag. * Add `maxStepReached` flag to Agents and Academy. * Change way value bootstrapping works in PPO to take advantage of timeouts. * Default size of GridWorld changed to 5x5 in order to validate bootstrapping changes.	7 年前
GitHub	f134016b	On Demand Decision (#308 ) * On Demand Decision : Use RequestDecision and RequestAction * New Agent Inspector : Use it to set On Demand Decision * New BrainParameters interface * LSTM memory size is now set in python * New C# API * Semantic Changes * Replaced RunMDP * New Bouncer Environment to test On Demand Dscision	7 年前
GitHub	69481d2d	Imitation Learning Helper (#371 ) * Add helper class to for Imitation Learning teacher. Allows for clearing buffer "C" and toggling adding info to the buffer "R".	7 年前
Arthur Juliani	5d402be9	Minor Optimizations (#836 )	6 年前
GitHub	0c417c55	Release v0.5 (#1202 )	6 年前
Deric Pang	634280a6	Fixed imports, all tests are passing.	6 年前
GitHub	3c9603d6	Demonstration Recorder (#1240 )	6 年前
GitHub	b6c97cb6	Fix for divide-by-zero error with Discrete Actions (#1520 ) * Enable buffer padding to be set other than 0 Allows buffer padding in AgentBufferField to be set to a custom value. In particular, 0-padding for `action_masks` causes a divide-by-zero error, and should be padded with 1’s instead. This is done as a parameter passed to the `append` method, so that the pad value can be set right after the instantiation of an AgentBufferField.	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
Chris Elion	bb7773c1	add flake8 to precommit	5 年前
GitHub	d7ebaae1	Return list instead of np array for make_mini_batch() (#2371 ) Return list instead of np array for make_mini_batch() to reduce time copying data	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
GitHub	bf375235	Change update buffer to float32 instead of float64 (#2461 ) - Reduces memory usage of buffer.	5 年前
GitHub	876aca1e	Use numpy for random sample in buffer (#2524 )	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
Ervin Teng	df5ee7bf	Split buffer into two buffers (PPO works)	5 年前
Ervin Teng	9053610f	Fix buffer tests and truncate	5 年前
Ervin Teng	c2d216ca	Add type hints to Buffer	5 年前
Ervin Teng	fd0647a6	Rename append_update_buffer to append_to_update_buffer	5 年前
Ervin Teng	3434352a	Non-working commit	5 年前
GitHub	652488d9	check for numpy float64 (#2948 )	5 年前
GitHub	213cd68d	Split Buffer into processing and update buffers (#2964 ) This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps: - Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR) - Move the processing buffer and add/process experiences into a separate, outside class - Change the data type of the update buffer to be a Trajectory - Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers	5 年前
Ervin Teng	c9116ed2	Move some common logic to buffer class	5 年前
Chris Elion	fdc810ff	move (first pass)	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
GitHub	e2ae7124	pass file mode to h5py.File() (#3165 ) * force different versions of h5py in CI * pass file modes to h5py.File	5 年前
GitHub	1f9d04f2	Fix clear update buffer when trainer stops training, add test (#3422 ) * Fix clear update buffer when trainer stops training, add test * Fix buffer changing types when truncated	5 年前
Christopher Goy	ba80b292	format files with pre-commit.	4 年前
GitHub	7ddfd81f	Added Reward Providers for Torch (#4280 ) * Added Reward Providers for Torch * Use NetworkBody to encode state in the reward providers * Integrating the reward prodiders with ppo and torch * work in progress, integration with PPO. Not training properly Pyramids at the moment * Integration in PPO * Removing duplicate file * Gail and Curiosity working * addressing comments * Enfore float32 for tests * enfore np.float32 in buffer	4 年前
GitHub	1f5eb9da	add pyupgrade to pre-commit and run (#4239 )	4 年前
GitHub	215b35c6	[refactor] Optimize buffer sample_minibatch (#4508 ) * Optimize buffer sample_minibatch	4 年前
GitHub	b853e5ba	Action buffer (#4612 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
Ervin Teng	184f27c6	Make buffer type-agnostic	4 年前
Ervin Teng	f5c180bc	Edit types of Apped method	4 年前
Ervin Teng	1a79cf5f	Change comment	4 年前
Ervin Teng	95bdbba3	Less broken PPO	4 年前
Ervin Teng	56dcd75a	Get next critic observations into value estimate	4 年前
Andrew Cohen	e1fad8a4	buffer error	4 年前
GitHub	64fc7f43	Buffer key enums (#4907 )	4 年前
Ervin Teng	3d0abb03	Make buffer typing neater	4 年前
Ervin Teng	ae7643b8	Proper critic memories for PPO	4 年前
Ervin Teng	d4438878	Merge branch 'develop-base-teammanager' into develop-agentprocessor-teammanager	4 年前
Ervin Teng	c2883f5b	Pad from back of trajectory	4 年前
Chris Elion	c3bc8991	cleanup, don't store mask	4 年前
Ervin Teng	e46a86ad	Merge branch 'master' into develop-superpush-int	4 年前
Ervin Teng	9bc88c41	Running COMA (not sure if learning)	4 年前
Ervin Teng	2f209c12	Buffer fixes (cherry picked from commit 2c03d2b544d0c615e7b60d939f01532674d80753)	4 年前
Ervin Teng	b3958a8d	Buffer fixes	4 年前
Ervin Teng	be45d8c0	Move padding method to AgentBufferField	4 年前
Ervin Teng	61781a1a	Merge branch 'main' into develop-agentprocessor-teammanager	4 年前
Andrew Cohen	9060da06	Merge branch 'develop-agentprocessor-teammanager' into develop-coma2-trainer	4 年前
Ervin Teng	50ab983e	Fix slicing typing and string printing in AgentBufferField	4 年前
Ervin Teng	bc3d3a95	Fix slicing typing and string printing in AgentBufferField	4 年前
Andrew Cohen	8f799687	ignoring precommit, grabbing baseline/critic mems from buffer in trainer	4 年前
GitHub	af36ef3b	[bug-fix] Fix typo (#5035 ) * Fix typo * Add test	4 年前
GitHub	f16ce486	Update v2-staging from main (March 15) (#5123 )	4 年前
GitHub	47db8ce1	[bug-fix] Fix padding for List entries in buffer (#5046 ) * Fix padding for List entries in buffer * Revert to coonverting to np.array * Fix dtype in PPO trainer	4 年前
Ervin Teng	8902c058	Merge branch 'main' into develop-coma2-trainer	4 年前
Ervin Teng	c108da4a	[bug-fix] Fix POCA LSTM, pad sequences in the back (#5206 ) * Pad buffer at the end * Fix padding in optimizer value estimate * Fix additional bugs and POCA * Fix groupmate obs, add tests * Update changelog * Improve tests * Address comments * Fix poca test * Fix buffer test * Increase entropy for Hallway * Add EOF newline * Fix Behavior Name * Address comments (cherry picked from commit 2ce6810846ba9268e4fb5fb082fa54e90414c980)	4 年前
GitHub	b9cab453	[perf] Optimizations for performance (#5192 ) * Lazy init the buffer when sampling * Update references rather than copy data * Don't create unneeded numpy arrays * Remove self[key] from loop	4 年前
GitHub	c5589b59	[bug-fix] Fix POCA LSTM, pad sequences in the back (#5206 ) * Pad buffer at the end * Fix padding in optimizer value estimate * Fix additional bugs and POCA * Fix groupmate obs, add tests * Update changelog * Improve tests * Address comments * Fix poca test * Fix buffer test * Increase entropy for Hallway * Add EOF newline * Fix Behavior Name * Address comments	4 年前

1 2

63 次代码提交 (d20bda06-1db5-4fb7-8ae7-9dffd80204cf)