ml-agents

作者	SHA1	备注	提交日期
Arthur Juliani	de700c3a	Multi Brain Training and Recurrent state encoder (#166 ) * `learn.py` is now main script for training brains. * Simultaneous multi-brain training is now possible. * `ghost-trainer` allows for proper training in adversarial scenarios. * `imitation-trainer` provides a basic implementation of real-time behavioral cloning. * All trainer hyperparameters now exist in `.yaml` files. * `PPO.ipynb` removed. * LSTM model added. * More dynamic buffer class to handle greater variety of scenarios.	7 年前
GitHub	51621334	State Stacking & Banan Environment (#262 ) * Add support for stacking past n states to allow network to learn temporal dependencies. * Add Banana Collector environment for demonstrating partially observable multi-agent environments. * Add 3DBall Hard which lacks velocity information in state representation. Used as test for LSTM and state-stacking features. * Rework Tennis environment to be continuous control and trainable in 100k steps.	7 年前
GitHub	36d58cee	Add Seeding, MaxStepReached, and Bootstrapping fix (#303 ) * Add ability to seed learning (numpy, tensorflow, and Unity) with `--seed` flag. * Add `maxStepReached` flag to Agents and Academy. * Change way value bootstrapping works in PPO to take advantage of timeouts. * Default size of GridWorld changed to 5x5 in order to validate bootstrapping changes.	7 年前
GitHub	8317a659	Behavioral Cloning & Trainers Reorg (#328 ) * Implement behavioral cloning for cc/dc, fc/rnn, state/observations. * Re-organize folder structure in anticipation of unitytrainers as a package. * Create demo environment BananaImitation to validate behavioral cloning. * Fixes #336	7 年前
GitHub	0277039d	Fix Basic Environment & Discrete States (#356 ) * Fix Basic environment to properly reflect number of states. * Fix discrete states when using stacked states. * Add trained model for Basic environment.	7 年前
eshvk	23981dbf	[containerization] CPU based containerization to support all environments that don't use observations	7 年前
GitHub	a3c7b426	Merge pull request #357 from Unity-Technologies/feature/containerization Feature/containerization	7 年前
Arthur Juliani	c3644f56	Buffer fix for properly masking gradients	7 年前
GitHub	f134016b	On Demand Decision (#308 ) * On Demand Decision : Use RequestDecision and RequestAction * New Agent Inspector : Use it to set On Demand Decision * New BrainParameters interface * LSTM memory size is now set in python * New C# API * Semantic Changes * Replaced RunMDP * New Bouncer Environment to test On Demand Dscision	7 年前
GitHub	69481d2d	Imitation Learning Helper (#371 ) * Add helper class to for Imitation Learning teacher. Allows for clearing buffer "C" and toggling adding info to the buffer "R".	7 年前
GitHub	a809630f	Add config for crawler, and change crawler scene (#376 ) * Add config for crawler, and change crawler scene * Changed number of crawlers in scene to 12 * Changed Max-steps for crawlers to 5000 * Newer hyperparameters and newly trained crawler model * Clean up crawler code, and improve efficency	7 年前
Arthur Juliani	22d931c0	Add comments to Reacher and re-train model w/ epsilon needed	7 年前
GitHub	e0d5b1b0	Fix for when not using teacher helper (#379 ) * Fix for when not using teacher helper * Rename expert to teacher throughout	7 年前
GitHub	41d32aca	[Bouncer Environment] Now in 3D (#408 ) * [New Bouncer] Revamped the Bouncer to be in 3D * [Bouncer Configuration file] Added the BouncerBrain configuration * [Documentation] Added the Bouncer tot he documentation page * [Fixes] Fixed lines too long and the documentation typo * Slight adjustments to bouncer environment * Don't default to internal brain on bouncer	7 年前
GitHub	bb82e25d	Revamped Push Block (#404 ) * Adds new revamped Push Block environment. * Adds "Shared Assets" folder to Examples sub-directory.	7 年前
GitHub	9ca530cd	Soccer Twos Environment (#420 ) * Add Soccer Twos environment, along with training parameters, embedded model, and relevant documentation.	7 年前
GitHub	4a7481a1	RayPercpetion, Push Block, and misc environment changes (#432 ) RayPerception moved to a component that is now used by Banana, Soccer, Hallway, and Push Block. Converted Push Block to use RayPerception for local perception and retrained model. Re-worked Hallway to be more extensible.	7 年前
GitHub	d8c09831	Feature/new wall jump (#446 ) * [New Environment] Added the WallJump and its configuration * [Documentation] Added the WallJump doc * [Fixes] Now uses switch and added comment	7 年前
GitHub	68692f8f	Remove unused configs (#489 )	7 年前
GitHub	237b41f9	Hotfix 0.3.0c (#618 ) Fixes the following issues: * Missing component reference in BananaRL environment. * Neural Network for multiple visual observations was not properly generated. * Episode time-out value estimate bootstrapping used incorrect observation as input.	7 年前
GitHub	3b866e9f	Use Clipped Gaussian (#649 ) This PR makes the following changes: * Moves clipping of continuous control model into model itself. Output is now always [-1, 1]. * Internal model values are now clipped between [-3, 3] before being rescaled to [-1, 1] for output. * This improves training performance by providing a wider range of values within which the pdf of the gaussian can fall. Output of [-1, 1] is used to be more environment-creator friendly. * Fixes issue where epsilon was erroneously being used to reconstruct old probabilities during PPO update, leading to reduced learning performance. * Introduce ScaleAction() function within python to easily rescale values from [-1, 1] to arbitrary range. * Re-train all CC models using improved algorithm. All performance levels are equal or improved. In the case of Crawler, improvement is drastic. * Update documentation appropriately. * Made miscellaneous minor code style and optimization improvements within environments.	7 年前
GitHub	9594f3d8	Walker Environment (#720 ) * Add `Walker` example environment and documentation.	7 年前
GitHub	c17937ef	Curiosity Driven Exploration & Pyramids Environments (#739 ) * Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer. * To enable, set use_curiosity flag to true in hyperparameter file. * Includes refactor of unitytrainers model code to accommodate new feature. * Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.	7 年前
GitHub	9ab98584	Additional Environment Variations (#791 ) * Add Visual (Camera) and Imitation Learning variations to example environments	7 年前
Arthur Juliani	5abb001b	[Add curiosity_enc_size: 128 to the trainer_config.yaml] (#826 )	7 年前
GitHub	c9c9e147	Revamp Crawler & Walker (#841 ) * Revamps agent code for walker and crawler environments to use shared JointDriveController system. * Crawler has been reworked to be very cute. * Crawler & Walker environments have been reworked to be visually consistent. * Added Dynamic Crawler scene. * All scenes re-trained and new models added. * Documentation changes.	6 年前
GitHub	75218e58	Several final improvement to docs, scene and configs. (#871 ) * Added missing declaration to docs sample code. * Added pretrained model as default graph in Internal brain of Tennis scene * Disabled PlayerBrain in Tennis by default. * Removed accidental config.	6 年前
GitHub	68d6170f	Error message when using ODD and Curiosity (#883 ) * Remove extra bouncer brain hyperparameters * Add error when using curiosity+odd	6 年前
Arthur Juliani	12d52cb0	Replace tanh on cc models w/ swish	6 年前
Arthur Juliani	8088d94a	Change lambda	6 年前
GitHub	d0158b01	Update visual hyperparameters (#1118 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	ab6eb8dc	Fix TF Nan bug (#1178 ) * Fix for TF NaNs * New soccer model	6 年前
GitHub	63062b92	updated the Pyramids model (#1184 ) * updated the Pyramids model * updated the pyramids model, chnaged the max_steps to reflect the new max steps required to achive ~1.8 cumulative reward	6 年前
GitHub	50228570	updated the walljump model for the multi-discrete action space (#1198 )	6 年前
GitHub	3c9603d6	Demonstration Recorder (#1240 )	6 年前
GitHub	f99dc261	Rename brains to new names (#1321 )	6 年前
vincentpierre	b5edc64a	typos in the config	6 年前
Arthur Juliani	107d734e	New model for the dynamic crawler (#1322 ) * New model for the dynamic crawler * Added Dynamic Crawler to yaml * Attatch model to brain * Remove unneeded player brain	6 年前
vincentpierre	5c060417	Added PushBlock models, fixed trainer config and fixed Learning brain asset (#1344 ) * Added PushBlock models, fixed trainer config and fixed Learning brain asset * Fixed PushBlock model to be in correct place * Added BananaLearning, deleted bytes files for PushBlock, fixed PushBlockLearning.asset * Deleted stray file * Added WallJumpArea training mods * Fixed Banana collector	6 年前
Arthur Juliani	59126c8c	Release v0.6 tennis (#1350 ) * Modified the scene, missing the model * modified the hyperparameters * Updated the model	6 年前
vincentpierre	6843dac6	Release v0.6 marwan tf (#1351 ) * Adding model for 3D Balance Ball. * Adding LearningBrain to BroadCast Hub. * Removed CrawlerPlayer Brain * Renamed CrawlerLearning —> CrawlerStaticLearning * Update Hallway models * Attaching model to brain for Hallway * Attaching model to 3DBall Brain. * Updated CrawlerLearning —> CrawlerStaticLearning on trainer config. * Adding Reacher model * Remove model specification in Hallway Brain asset * Removing model specification from 3Dball scene * Adding crawler model file * Specifying learning brain as default for crawler	6 年前
vincentpierre	148bd304	updated the models for the soccer, gridworld and 3dballhard (#1328 ) * updated the models for the soccer, gridworld and 3dballhard * updated the 3dball hard model * updated the soccer model	6 年前
GitHub	610b8852	Release v0.8.2 update models (#2178 ) * ignore the idea file * Retrained most of the models * Updated the remaining models	5 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
GitHub	be4292fb	Add different types of visual encoder (nature cnn/resnet) Add resnet and nature cnn in addition to default visual encoder	5 年前
GitHub	6a212f73	Improvements for GAIL (#2296 ) * Don't 0 value bootstrap for GAIL and Curiosity * Add gradient penalties to GAN to help with stability * Add gail_config.yaml with GAIL examples * Cleaned up trainer_config.yaml and unnecessary gammas * Documentation updates * Code cleanup	5 年前
Ervin T	ca32cadf	Fix default for vis_encode_type (#2330 )	5 年前
GitHub	3683cc1c	Enable learning rate decay to be disabled (#2567 )	5 年前
GitHub	bebdb293	ML-Agents Branding & Color Updates (#2583 ) * new env styles rebased on develop * added new trained models * renamed food collector platforms * reduce training timescale on WallJump from 100 to 10 * uncheck academy control on walljump * new banner image * rename banner file * new example env images * add foodCollector image * change Banana to FoodCollector and update image * change bouncer description to include green cube * update image * update gridworld image * cleanup prefab names and tags * updated soccer env to reference purple agent instead of red * remove unused mats * rename files * remove more unused tags * update image * change platform to agent cube * update text. change platform to agents head * cleanup * cleaned up weird unused meta files * add new wall jump nn files and rename a prefab * walker change stacked states from 5 to 1 walker collects physics observations so stacked states are not need...	5 年前
Vilmantas Balasevicius	2d032594	Further modifications to make PPO work	5 年前
Hunter	c92a9008	init	5 年前
Hunter	47d31907	added new nn files	5 年前
Hunter	70e7a646	clean up config	5 年前
GitHub	99146e97	1 to 1 Brain to Agent (#2729 ) * 1 to 1 Brain to Agent This is a work in progess In this PR : - Deleted all Brain Objects - Moved the BrainParameters into the Agent - Gave the Agent a Heuristic method (see Balance Ball for example) - Modified the Communicator and ModelRunner : Put can only take one agent at a time - Made the IBrain Interface with RequestDecision and DecideAction method No changes made to Python [Design Doc](https://docs.google.com/document/d/1hBhBxZ9lepGF4H6fc6Hu6AW7UwOmnyX3trmgI3HpOmo/edit#) * Removing editorconfig * Updating BallanceBall scene * grammar mistake * Clearing the Agents of the Model runner * Added Documentation on IBrain * Modified comments on GiveModel * Introduced a factory * Split Learning Brain in two * Changes to walljump * Fixing the Unit tests * Renaming the Brain to Policy * Heuristic now has priority over training * Edited code comments * Fixing bugs * Develop one to one scene edits...	5 年前
Hunter	7c1a38e0	add drawspheres gizmo to perception	5 年前
Hunter	90457de5	added builder env. observing blocks pos	5 年前
Hunter	8b55f522	more testing with high targets	5 年前
GitHub	72bab623	reduce max_steps for Gridworld (#2973 )	5 年前
Ervin Teng	58a4ea71	Increase max steps for 3DBall	5 年前
Ervin Teng	eb4a04a5	Merge branch 'master' into develop-tanhsquash	5 年前
GitHub	bec2e8f0	Add Trajectory/Policy Queues, move Trainer logic to advance() (#3113 )	5 年前
GitHub	14193ada	Self-play for symmetric games (#3194 )	5 年前
Ervin Teng	9b0b2fed	Reduce memory sizes	5 年前
GitHub	6284ea4a	Reduce max steps for Bouncer, summary for Hallway (#3343 )	5 年前
GitHub	ae97ab3a	Soccer refactor (#3331 )	5 年前
GitHub	0d6fffc1	Reduce num steps for walljump (#3377 )	5 年前
Ervin Teng	d4ee7346	Merge commit 'f9c05a61d574305497789b5997f1ae3ea1b1ad3b' into develop-splitpolicyoptimizer	5 年前
Andrew Cohen	23f74f21	soccer fives	5 年前
Andrew Cohen	5c7a1fbf	cloud run	5 年前
Ervin Teng	5ef902bf	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
Andrew Cohen	39a76867	added more backward raycasts to twos and fives	5 年前
Andrew Cohen	5b0aca29	Merge branch 'master' into soccer-fives	5 年前
Andrew Cohen	4edb7f41	updated config/soccer brains	5 年前
Andrew Cohen	30725c27	2v1 soccer config and env	5 年前
Ervin Teng	c825f13e	Reduce PushBlock max_steps	5 年前
Ervin Teng	c3ff4a31	Cut bouncer max steps	5 年前
Anupam Bhatnagar	21a526c5	[skip ci] shorter 3dball run	5 年前
GitHub	0d3fd17e	[bug-fix] Increase 3dballhard and GAIL default steps (#3636 )	5 年前
Andrew Cohen	53bea15c	Merge branch 'master' into soccer-fives	5 年前
Andrew Cohen	072b4135	soccer 2v1 on the cloud	5 年前
Andrew Cohen	c70cfa63	running soccer for more steps	5 年前
Andrew Cohen	5d21e211	tennis config	5 年前
Andrew Cohen	a13f107f	updated self-play doc for asymmetric games/changed current_self->current_best	5 年前
Andrew Cohen	6e43bbf4	soccer config	5 年前
Andrew Cohen	bc611906	removed team-change CLI	5 年前
Andrew Cohen	42518d84	Merge branch 'self-play-mutex' into soccer-2v1	5 年前
Andrew Cohen	650ec121	Merge branch 'self-play-mutex' into soccer-2v1	5 年前
Andrew Cohen	941b8ae7	Strikers vs goalie added	5 年前
Andrew Cohen	1ba1bc22	tennis config	5 年前
Andrew Cohen	345fa382	current_best_ratio -> latest_model_ratio	5 年前
Andrew Cohen	c7a34413	Merge branch 'self-play-mutex' into soccer-2v1	5 年前
Andrew Cohen	61d38b15	rerunning all self-play	5 年前
Andrew Cohen	d9f1a2f5	more experiments for self-play	5 年前
Andrew Cohen	d7b8cf16	CubeWars	5 年前
Anupam Bhatnagar	50e52d9c	Merge branch 'master' into distributed-training	5 年前
Andrew Cohen	72706301	soccer curriculum	5 年前
Andrew Cohen	e91f5233	reduced steps cubewars	5 年前
Andrew Cohen	9fed4985	tennis curriculum	5 年前
bhh	35736d30	added scripts	5 年前
Andrew Cohen	2e7f8f41	Merge branch 'develop-cubewars' into asymm-envs	5 年前
Andrew Cohen	a0985d94	increased striker goalie steps	5 年前
bhh	dc9fcd46	loosened joints retrained looking good	5 年前
bhh	1ecc8924	final training done. ready to go.	5 年前
Andrew Cohen	5a5e13fa	soccertwos config	5 年前
Anupam Bhatnagar	d49ceecc	[skip ci] moving summary writer to update_policy [skip ci] more fixes [skip ci] tweaking 3dball configs [skip ci] swap summary writer and step increment order	5 年前
bhh	9e40ed64	update config to 3.5M steps	5 年前
Andrew Cohen	44e6fa7b	soccer 1e8 timesteps/Tennis existential penalty	5 年前
Andrew Cohen	900ae050	new SoccerTwos brain	5 年前
Andrew Cohen	6f1f89f6	new soccertwos brain	5 年前
Hunter-Unity	2751b3a4	updated crawlerAgent code to match worm env	5 年前
Andrew Cohen	a90812a0	soccer twos for 50mill	5 年前
Andrew Cohen	384f6439	reduced laser cd/increased heal	5 年前
GitHub	9695b89a	StrikerVsGoalie and SoccerTwos env improvements (#3699 )	5 年前
Andrew Cohen	34349a2f	reduce latest_model prob	5 年前
Andrew Cohen	72bd2c5d	Merge branch 'soccer-2v1' into asymm-envs	5 年前
Andrew Cohen	8431ecb5	tennis reward fix	5 年前
Andrew Cohen	79276531	new goalie	5 年前
Andrew Cohen	e5c62cb8	update striker vs goalie brain/retrain	5 年前
Andrew Cohen	1d020fa7	Merge branch 'soccer-2v1' into asymm-envs	5 年前
GitHub	f8909ab1	Add New 3 Joint Ragdoll Worm Environment (#3798 ) Co-authored-by: Arthur Juliani <awjuliani@gmail.com>	5 年前
Andrew Cohen	d54fdfbf	increase batch/buff/erbeta	5 年前
Andrew Cohen	028a8d59	larger network/6 stacked obs	5 年前
Andrew Cohen	ca6cdff3	fixed broken prefab...	5 年前
Andrew Cohen	32f562d9	striker goalie increase latest_mod ratio	5 年前
Andrew Cohen	3df4f4a3	smaller window cubewar	5 年前
Andrew Cohen	717fae65	reduce tennis latest_model_ratio	5 年前
Andrew Cohen	a1143427	increased entro bonus tennis	5 年前
Andrew Cohen	e9f570aa	slightly larger beta tennis	5 年前
Andrew Cohen	3f806353	increased beta	5 年前
Andrew Cohen	54972202	tuning beta tennis	5 年前
Andrew Cohen	04ac54a3	reduced tennis time horizon	5 年前
Andrew Cohen	fda39c3d	more beta tuning...	5 年前
Andrew Cohen	1c2e1d79	increase beta	5 年前
Andrew Cohen	e7922b68	trying larger beta	5 年前
Andrew Cohen	14df5d02	increase gamma	5 年前
Andrew Cohen	8fba6faa	increase network capacity	5 年前
Andrew Cohen	bbc1014a	reduce learning rate	5 年前
Andrew Cohen	d5428487	addforce and static walls	5 年前
Andrew Cohen	b6784390	frequent swapping of diverse opponents tennis	5 年前
Andrew Cohen	d9d6c172	remove threading	5 年前
Andrew Cohen	de2ca11b	no thread config	5 年前
Andrew Cohen	c5ce18c7	remove x/y vel, smaller network	5 年前
Andrew Cohen	fd7ee405	normalize by hand	5 年前
Andrew Cohen	e58a3f5e	small swap	5 年前
Andrew Cohen	fa66e9e9	beta.005	5 年前
Andrew Cohen	c3fd56b5	testing beta	5 年前
Andrew Cohen	53f2f360	long tennis/soccer runs	5 年前
Andrew Cohen	a89d9791	changed striker vs goalie config	5 年前
Andrew Cohen	5dfa0014	increased beta for all self-play	5 年前
Andrew Cohen	59a60c1e	Merge branch 'master' into asymm-envs	5 年前
Andrew Cohen	11815554	revert soccer hyper params	5 年前
Andrew Cohen	3c2ce7be	beta...	5 年前
Andrew Cohen	4083e344	tennis window 10	5 年前
Andrew Cohen	46654d49	soccer 100	5 年前
Andrew Cohen	bc249921	riker goalie 100	5 年前
Andrew Cohen	4671cf17	tnenis congif	5 年前
Andrew Cohen	78744111	test ghost	5 年前
Andrew Cohen	5f8ef3ca	.5 opponent tennis	5 年前
Andrew Cohen	20d973c8	bug	5 年前
Andrew Cohen	4e4cf9e2	.5	5 年前
Andrew Cohen	53d1a98d	more entro	5 年前
Andrew Cohen	b6b2c58e	smaller window	5 年前
Andrew Cohen	6568158f	3.o beta	5 年前
Andrew Cohen	b6d9c58b	beta 2	5 年前
Andrew Cohen	bca3bd73	return to team change	5 年前
Andrew Cohen	4b8db5c3	test failure	5 年前
Andrew Cohen	55bafe1b	control	5 年前
Andrew Cohen	af364ac9	more exsp	5 年前
Andrew Cohen	d91a7cbd	reduce time horizon tennis	5 年前
Andrew Cohen	446bdeee	hund	5 年前
Andrew Cohen	4ba0d98c	cubewar and tennis stability test	5 年前
Andrew Cohen	bd1d6c08	all self-play	5 年前
Andrew Cohen	150e7d73	cubewar threaded false	5 年前

1 2 3 4

174 次代码提交 (7d8651ac-4808-4d52-86d5-a5423bd3329b)