318 次代码提交 (4e05233f-1184-4c57-bee7-435b2b78bdec)

作者 SHA1 备注 提交日期
Arthur Juliani de700c3a Multi Brain Training and Recurrent state encoder (#166) 7 年前
GitHub 51621334 State Stacking & Banan Environment (#262) 7 年前
vincentpierre b7f787f6 bug fix on range of observations 7 年前
Arthur Juliani 7bf0c888 trainer will raise an error if the memory of the brain is set wrong (#273) 7 年前
GitHub f8a8b112 Move epsilon generation into graph (#283) 7 年前
GitHub 36d58cee Add Seeding, MaxStepReached, and Bootstrapping fix (#303) 7 年前
GitHub e676017b Reorganize learn.py (#302) 7 年前
GitHub 8317a659 Behavioral Cloning & Trainers Reorg (#328) 7 年前
GitHub e11dae1d Python Testing & Image Inference Improvements (#353) 7 年前
eshvk 030ac5c5 [cleanup] Add a new type hint to call a dictionary of BrainInfo objects as an AllBrainInfo. Propagate this hint to all methods. Some pep8 cleanups. 7 年前
Arthur Juliani c3644f56 Buffer fix for properly masking gradients 7 年前
GitHub f8d27dc5 Merge branch 'development-0.3' into feature/LSTM2 7 年前
GitHub 99103b29 Use `curr_brain_info` 7 年前
GitHub f134016b On Demand Decision (#308) 7 年前
GitHub dcf58f75 Feature/previous text action (#375) 7 年前
GitHub a7c9096f [Semantics] Modified the placeholder names (#381) 7 年前
GitHub 5bdef358 [Fix] Must take mean of entropy to avoid errors what number of agents change during training (#407) 7 年前
GitHub 848b8a58 Fix PPO regression (#434) 7 年前
vincentpierre e5a59e9b [Refactor] renamed is_continuous to is_continuous_action and added is_continuous_observation to decrease confusion 7 年前
eshvk 2d2eb64b [containers] Enables container support for scenes that use visual observations 7 年前
GitHub e43c069e Merge pull request #547 from Unity-Technologies/develop-feature-docker-improvements 7 年前
GitHub 237b41f9 Hotfix 0.3.0c (#618) 7 年前
GitHub 1a449e98 Hotfix 0.3.1b (#637) 7 年前
vincentpierre 076c8744 Report means instead of totals for losses (#580) 7 年前
GitHub b2675216 Hotfix 0.3.1b (#656) 7 年前
GitHub 755be43e [Cold Fix] Making the episode length and mean reward more accurate for the first episode (#657) 7 年前
GitHub 3b866e9f Use Clipped Gaussian (#649) 7 年前
Arthur Juliani 9477eaa9 Develop fix cumulative reward (#725) 7 年前
GitHub 702d98c6 [Fix] The summary writer is now implemented in the abtract trainer class. (#806) 7 年前
GitHub c17937ef Curiosity Driven Exploration & Pyramids Environments (#739) 7 年前
vincentpierre a22c0f65 [fixing encoding_size] 7 年前
Arthur Juliani d7338050 Enable concurrent sessions 7 年前
Arthur Juliani 5d402be9 Minor Optimizations (#836) 7 年前
GitHub 8526dcfc Fix for visual observations (#847) 7 年前
GitHub 0f65e272 [Addresses #842] (#849) 7 年前
GitHub 47fc38ab Additional Tests & Bug Fixes (#854) 7 年前
GitHub 6e6e8d96 Fix for CC models w/ RNN and Curiosity (#860) 6 年前
vincentpierre 4c6439d5 [Attempted fix] 6 年前
GitHub 6df07946 Fix for Discrete observations + Curiosity (#866) 6 年前
GitHub 68d6170f Error message when using ODD and Curiosity (#883) 6 年前
Arthur Juliani 5e48766d Remove discrete observations 6 年前
Arthur Juliani 195ac934 Merge branch 'develop' into develop-runs 6 年前
vincentpierre e47cec56 [Initial Commit] 6 年前
unityjeffrey 0d67f311 changed ml agents to ml-agents 6 年前
unityjeffrey 19fb437a changed to Unity ML-Agents Toolkit (english) 6 年前
Arthur Juliani 9701c3db Merge branch 'hotfix-0' into release-v0.4-fix-curiosity-odd 6 年前
Arthur Juliani 0c6411c2 Use switch between old and new behavior 6 年前
Arthur Juliani 1bfbf67a Simplify approach 6 年前
Arthur Juliani cfb7cfef Code clean-up 6 年前
Arthur Juliani 083cbff5 Add to docstring 6 年前
Arthur Juliani c31f63b5 Fix typo 6 年前
GitHub e50ac7ae Merge branch 'develop' into hotfix-0 6 年前
Deric Pang 8380f2f2 Moved curriculum code out of environment code. 6 年前
Arthur Juliani 1eb701af Merge remote-tracking branch 'origin/develop' into develop-value-estimates-ppo 6 年前
Arthur Juliani f52d5a92 Merge remote-tracking branch 'origin/develop' into develop-runs 6 年前
GitHub ef3025e6 Merge pull request #1004 from Unity-Technologies/develop-runs 6 年前
GitHub 7d0990cf Fix MultiBrain bug that was introduced with the value estimates (#1018) 6 年前
Arthur Juliani 52865022 [Fix bug 1040] (#1062) 6 年前
Arthur Juliani 3659bbcd Develop multi discrete (#1022) 6 年前
Arthur Juliani fee02a84 Attempted fix for #1059 (#1089) 6 年前
Deric Pang 634280a6 Fixed imports, all tests are passing. 6 年前
Arthur Juliani 17224292 Fix for Curiosity with ODD (#1107) 6 年前
GitHub ded0d8c7 Develop action masking (#1080) 6 年前
Deric Pang e55b1764 Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure 6 年前
Deric Pang e0e02ae6 Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure 6 年前
Deric Pang cdb41480 Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure 6 年前
GitHub fbf92810 Refactor Trainers to use Policy (#1098) 6 年前
GitHub 10d2a19d Release v0.5 (Develop) (#1203) 6 年前
GitHub 29084e77 Curriculum learning reward thresholding bug fix (#1141) 6 年前
GitHub d2c320dd Remove graph scope (#1205) 6 年前
GitHub 3c9603d6 Demonstration Recorder (#1240) 6 年前
GitHub 840417ff Use organized tags for tensorboard stats (#1248) 6 年前
GitHub 6c354d16 New Learning Brain (#1303) 6 年前
GitHub b6c97cb6 Fix for divide-by-zero error with Discrete Actions (#1520) 6 年前
GitHub c258b1c3 Move 'take_action' into Policy class (#1669) 6 年前
eshvk cc9bdf17 Added logging per Brain of time to update policy, time elapsed during training, time to collect experiences, buffer length, average return 6 年前
eshvk fb04c40c Reorganize to make metrics collection more accurate 6 年前
GitHub 93760bc4 Adds SubprocessUnityEnvironment for parallel envs (#1751) 6 年前
eshvk ef8009d9 Python code reformat via [`black`](https://github.com/ambv/black). 6 年前
Vincent(Yuan) Gao a15763f8 Clear cumulative_returns_since_policy_update (#2120) 5 年前
GitHub a4d5b2d3 Doc/comment cleanup - Fix some occurrences of 'the the' (#2119) 5 年前
GitHub 2671e1a0 Enable mypy in precommit checks (#2177) 5 年前
GitHub 4ac79742 Refactor reward signals into separate class (#2144) 5 年前
Jonathan Harper 177ee5b8 Remove unused "last reward" logic, TF nodes 5 年前
GitHub b05c9ac1 Add environment manager for parallel environments (#2209) 5 年前
Chris Elion bb7773c1 add flake8 to precommit 5 年前
GitHub 9c50abcf GAIL and Pretraining (#2118) 5 年前
GitHub 1c18bd18 Swap 0 set and reward buffer append (#2273) 5 年前
GitHub a5b7cf95 Fix get_value_estimate and buffer append (#2276) 5 年前
Chris Elion 5d07ca1f Merge remote-tracking branch 'origin/develop' into enable-flake8 5 年前
GitHub be4292fb Add different types of visual encoder (nature cnn/resnet) 5 年前
GitHub 9eb3f049 Cleanup unused code in TrainerController (#2315) 5 年前
GitHub 6225317d refactor vis_encoder_type and add to doc 5 年前
GitHub a9fe719c Add Multi-GPU implementation for PPO (#2288) 5 年前
GitHub d7ebaae1 Return list instead of np array for make_mini_batch() (#2371) 5 年前
GitHub 7b69bd14 Refactor Trainer and Model (#2360) 5 年前
GitHub bd7eb286 Update reward signals in parallel with policy (#2362) 5 年前
GitHub 689765d6 Modification of reward signals and rl_trainer for SAC (#2433) 5 年前
GitHub 43696d60 Fix bug in add_rewards_output and add test (#2442) 5 年前
GitHub 832e4a47 Normalize observations when adding experiences (#2556) 5 年前
GitHub 67d754c5 Fix flake8 import warnings (#2584) 5 年前
Ervin Teng 094cbe4d Fix bug when batch size is a non-multiple of sequence length (#2661) 5 年前
GitHub 5d3e05d1 Fix "memory leak" during inference (#2722) 5 年前
GitHub 4da157fe more pylint fixes (#2842) 5 年前
Ervin Teng 748c250e Somewhat running 5 年前
Andrew Cohen 13fe9cf8 Bubbled up indexing of AllBrainInfo to trainer controller from trainers 5 年前
Andrew Cohen e96b80db recieves brain_name and identifier on python side 5 年前
Ervin Teng df5ee7bf Split buffer into two buffers (PPO works) 5 年前
Ervin Teng 3a4fa244 Switch to tanh squash in PPO 5 年前
Ervin Teng fd0647a6 Rename append_update_buffer to append_to_update_buffer 5 年前
GitHub 652488d9 check for numpy float64 (#2948) 5 年前
GitHub 213cd68d Split Buffer into processing and update buffers (#2964) 5 年前
Ervin Teng 2c9376bc Convert to trajectory 5 年前
Ervin Teng 9e661f0c Looks like it's training 5 年前
Ervin Teng a97ffb47 Attempt reward reporting 5 年前
Ervin Teng 9c5fdd31 Stats reporting is working 5 年前
Ervin Teng eb4a04a5 Merge branch 'master' into develop-tanhsquash 5 年前
GitHub 3b4b0d55 Remove random normal epsilon (#3039) 5 年前
Ervin Teng e0e57188 Clean up some stuff 5 年前
Andrew Cohen 5097bcc0 recieves brain_name and identifier on python side 5 年前
Ervin Teng f94365a2 No longer using ProcessingBuffer for PPO 5 年前
Ervin Teng 8b3b9e6c Move trajectory and related functions to trajectory.py 5 年前
Ervin Teng 76abf968 Add back max_step logic 5 年前
Andrew Cohen 8578b0b7 add_policy and create_policy separated 5 年前
GitHub 36048cb6 Moving Env Manager to Trainers (#3062) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. 5 年前
Ervin Teng c9116ed2 Move some common logic to buffer class 5 年前
GitHub 90db165f Add --namespace-packages to mypy for mlagents (#3075) 5 年前
Andrew Cohen 614d276f recieves brain_name and identifier on python side 5 年前
Andrew Cohen 96922f84 recieves brain_name and identifier on python side 5 年前
Ervin Teng 27c2a55b Lots of test fixes 5 年前
Ervin Teng 97d66e71 Remove BootstrapExperience 5 年前
Ervin Teng 324d217b Move agent_id to Trajectory 5 年前
Ervin Teng 77ff4822 Add back next_obs 5 年前
Andrew Cohen d1edbf43 add_policy and create_policy separated 5 年前
Ervin Teng 2b811fc8 Properly report value estimates and episode length 5 年前
GitHub 2fd305e7 Move add_experiences out of trainer, add Trajectories (#3067) 5 年前
Andrew Cohen de902fbb passes all pytest and C# tests 5 年前
GitHub 2ac242f7 Remove TrainerMetrics and add CSVWriter using new StatsWriter API (#3108) 5 年前
Ervin Teng fdf9aea7 Make conversion methods part of NamedTuples 5 年前
Ervin Teng 6242b67d Add way to check if trajectory is done or max_reached 5 年前
GitHub 0b5b1b01 Develop magic string + trajectory (#3122) 5 年前
GitHub c7da0139 Fix mypy errors in trainer code. (#3135) 5 年前
Andrew Cohen 082789ea Merge branch 'master' into develop-magic-string 5 年前
Andrew Cohen 6a4e7cf9 added ppo/sac_policy attributes to keep up with master 5 年前
Ervin Teng 1bd791e5 Merge branch 'master' into develop-agentprocessor 5 年前
Andrew Cohen 3e76adbd fixing more ci tests 5 年前
Ervin Teng e577d5ea Fix some mypy issues and remove unused code 5 年前
Andrew Cohen c3a92afa fixing ci ppo_policy 5 年前
Ervin Teng 9e0ef912 Fixed value estimate bug 5 年前
GitHub bec2e8f0 Add Trajectory/Policy Queues, move Trainer logic to advance() (#3113) 5 年前
Ervin Teng db743971 Move private methods out of trainer, simplify interface 5 年前
Ervin Teng b3a4e641 Remove some vestigial code 5 年前
Ervin Teng 48793ec1 Fix test 5 年前
GitHub 5bc7531b Get step from policy (#3223) 5 年前
Ervin Teng cd74e51b More progress 5 年前
Ervin Teng 76ad64d7 Some more bugfixes 5 年前
Ervin Teng 29f3330f Merge master into hotfix-0.13.1 5 年前
GitHub 329b23e0 Fix extra summary being written when loading from checkpoint (#3272) 5 年前
Ervin Teng 164732a9 Move optimizer creation to Trainer, fix some of the reward signals 5 年前
Ervin Teng 151e3b1c Move policy to common location, remove epsilon 5 年前
Ervin Teng db249ceb Merge branch 'master' into develop-splitpolicyoptimizer 5 年前
Ervin Teng edeceefd Zeroed version of LSTM working for PPO 5 年前
Ervin Teng cfc2f455 Fix BC and tests 5 年前
Ervin Teng 78671383 Move initialization call around 5 年前
GitHub dd86e879 Separate out optimizer creation and policy graph creation (#3355) 5 年前
Ervin Teng 00017bab Temporarily remove multi-GPU 5 年前
Ervin Teng be9d772e Add option to not condition sigma on obs 5 年前
Ervin Teng 88998fc9 Add add_policy docstrings 5 年前
GitHub e4177de0 [change] Organize trainer files a bit better (#3538) 5 年前
GitHub cb153a0f [change] Change warning language when adversarial scene is used without self-play (#3561) 5 年前
GitHub c42a11c3 [change] Throw a proper error when sequence length is greater than batch size. (#3583) 5 年前
GitHub ec278616 Hotfixes for Release 0.15.1 (#3698) 5 年前
GitHub 6709a9bf [change] Clean up trainer interface, clean up GhostTrainer stats (#3634) 5 年前
Andrew Cohen 9f09a65d team id centric ghost trainer 5 年前
GitHub 4ecd6ad3 Fix how we set logging levels (#3703) 5 年前
Andrew Cohen 59b88be6 Merge branch 'master' into self-play-mutex 5 年前
Andrew Cohen 3de78baa wrapped trainer has internal policy ghost 5 年前
Andrew Cohen 3013774b alternative to internal-policy fix 5 年前
Ervin Teng f29b17a9 Don't block one policy queue 5 年前
Anupam Bhatnagar eb9f3f19 [skip ci] replace buffer length by buffer size 5 年前
Anupam Bhatnagar ac80ec82 [skip ci] increment steps on training 5 年前
Anupam Bhatnagar d49ceecc [skip ci] moving summary writer to update_policy 5 年前
Anupam Bhatnagar 95ba923d [skip ci] fix first summary statement output 5 年前
Ervin Teng 5e980ec1 Merge branch 'master' into develop-sac-apex 5 年前
Anupam Bhatnagar 45bac63e [skip ci] more fixes 5 年前
Anupam Bhatnagar 9d7dd3b6 [skip ci] moving step increment to trainer from environment for sac 5 年前
Arthur Juliani 7c3bd376 Refactoring policy and optimizer 5 年前
Arthur Juliani 3c82bf59 Training runs, but doesn’t actually work 5 年前
Arthur Juliani 8c6f4696 Fix a couple additional bugs 5 年前
Arthur Juliani 61d671d8 Add conditional sigma for distribution 5 年前
Arthur Juliani 212e2d1d Merge remote-tracking branch 'origin/master' into develop-add-fire 5 年前
GitHub 232519e4 [refactor] Move output artifacts to a single results/ folder (#3829) 5 年前
Arthur Juliani ca887743 Support tf and pytorch alongside one another 5 年前
GitHub 422247a0 update versions for patch release (#3970) 5 年前
GitHub 4641038e Renaming max_step to interrupted in TermialStep(s) (#3908) 5 年前
Arthur Juliani 89ad3020 Merge remote-tracking branch 'origin/master' into develop-add-fire 5 年前
Christopher Goy ba80b292 format files with pre-commit. 4 年前
GitHub e274bcf6 Update precommit flake8 (#3961) 5 年前
Andrew Cohen 0e965a4d sensitivity 5 年前
Andrew Cohen 23b84dea ignoring commit checks but write to csv 5 年前
Andrew Cohen 61aa9915 write to csv 5 年前
Arthur Juliani 28e095e0 Merge remote-tracking branch 'origin/master' into develop-add-fire 5 年前
Ervin Teng f214836a Changes for speed test 4 年前
GitHub e92b4f88 [refactor] Structure configuration files into classes (#3936) 5 年前
GitHub 09853e13 [refactor] Move checkpoint saving into trainer (#4034) 5 年前
GitHub 7229214c [cleanup] Remove unused param keys (#4067) 4 年前
GitHub a1c63c4b Release 3 Cherry-pick bug-fixes and doc changes from master (#4102) 4 年前
Anupam Bhatnagar 4afd8f92 first commit 4 年前
Anupam Bhatnagar 8b6c19ae [skip ci] adding should_still_train method to ppo 4 年前
Arthur Juliani 9724c9ac Merge master 4 年前
Arthur Juliani 46874cc7 ONNX exporting 4 年前
GitHub 05a11c96 Develop add fire exp framework (#4213) 4 年前
GitHub 45154f52 Pytorch port of SAC (#4219) 4 年前
GitHub a28e2767 Update add-fire to latest master, including Policy refactor (#4263) 4 年前
GitHub 69579611 [refactor] Refactor Actor and Critic classes (#4287) 4 年前
Ruo-Ping Dong 6feec58a add Saver class (only TF working) 4 年前
GitHub 93517833 [feature] Fix TF tests, add --torch CLI option, allow run TF without torch installed (#4305) 4 年前
GitHub 7ddfd81f Added Reward Providers for Torch (#4280) 4 年前
Ruo-Ping Dong 71fe4df6 fix formatting and test 4 年前
Ruo-Ping Dong 09a741c8 small improvement 4 年前
Ruo-Ping Dong 79d89158 Merge branch 'develop-add-fire' into develop-add-fire-checkpoint 4 年前
GitHub 3bcb029b [refactor] Remove BrainParameters from Python code (#4138) 4 年前
Ruo-Ping Dong e06812aa fix tests 4 年前
GitHub 84440f05 Convert checkpoints to .NN (#4127) 4 年前
GitHub 1f5eb9da add pyupgrade to pre-commit and run (#4239) 4 年前
GitHub 129f9ddc [MLA-427] make pyupgrade convert f-strings too (#4244) 4 年前
HH 9e6edb6c try new reward falloff 4 年前
HH c3c83920 cleanup 4 年前
Andrew Cohen d8c123a0 Merge branch 'master' into sensitivity 4 年前
Andrew Cohen 02df39ab ignore precommit 4 年前
Andrew Cohen fa35292c write hist to tb 4 年前
GitHub 1b098c9a Refactor TFPolicy and Policy (#4254) 4 年前
GitHub beb5aca5 [refactor] Make classes except Optimizer framework agnostic (#4268) 4 年前
Andrew Cohen 06e4356c Merge branch 'master' into sensitivity 4 年前
GitHub 3f44a0bc cleanup around AdamOptimizer (#4333) 4 年前
Ruo-Ping Dong d3eb6c46 Merge branch 'develop-add-fire' into develop-add-fire-checkpoint 4 年前
Ruo-Ping Dong 95858e25 update saver interface and add tests 4 年前
Ruo-Ping Dong 523248be update 4 年前
HH 8eaddb61 Merge branch 'master' into hh/develop/loco-walker-variable-speed 4 年前
Ruo-Ping Dong 409a161c fix bc tests 4 年前
GitHub 25dc8c3d Add Saver Class to handle all save/load/checkpoint/export work (#4323) 4 年前
Ervin Teng d65a9326 Merge branch 'master' into develop-add-fire-mm3 4 年前
Ruo-Ping Dong d57aa9ab Merge branch 'develop-add-fire-mm3' into develop-add-fire-checkpoint 4 年前
GitHub 8985a040 Removing the experiment script from add fire (#4373) 4 年前
Andrew Cohen a65d08c7 ghost trainer tests 4 年前
GitHub 49545ce1 Pytorch ghost trainer (#4370) 4 年前
Andrew Cohen fcec6734 added comments 4 年前
GitHub 0d0d2ead [add-fire] Revert unneeded changes back to master (#4389) 4 年前
Andrew Cohen e7c9ff35 clean up docstrings create policies 4 年前
Andrew Cohen 039ae17f capitalize Tensorflow 4 年前
GitHub 1955af9e [feature] Add experimental PyTorch support (#4335) 4 年前
Ruo-Ping Dong c47ffc20 Rename saver 4 年前
Ruo-Ping Dong 27fb4270 brain_name to behavior_name 4 年前
Ruo-Ping Dong f5dee9d1 jit for continuous control 4 年前
GitHub 6f534366 Add torch_utils class, auto-detect CUDA availability (#4403) 4 年前
Andrew Cohen 643c8e58 ppo extended 4 年前
GitHub c188781b [life improvement] Moving Python files around (#4531) 4 年前
Ervin Teng b3e15d30 Always use separate critic 4 年前
Andrew Cohen e5f14400 Merge branch 'master' into develop-hybrid-actions-singleton 4 年前
GitHub a690af74 [refactor] Make PyTorch the default and TensorFlow optional (#4517) 4 年前
Andrew Cohen 8013e544 ignoring Instance of 'AbstractContextManager' has no 'enter_context' member (no-member) 4 年前
GitHub cb8e4d25 Add ActionSpec (#4586) 4 年前
Andrew Cohen 9689cf2c remove *_action_* from function names 4 年前
GitHub 3c96a3a2 Action Model (#4580) 4 年前
GitHub 88d3ec3e Merge master into hybrid actions staging branch (#4704) 4 年前
Ervin Teng 184f27c6 Make buffer type-agnostic 4 年前
Ervin Teng 0cdb2040 Use tanh squash 4 年前
Ervin Teng 3b15cc32 Multiprocessing but Stats are quite broken 4 年前
Ervin Teng 3765c15a Merge branch 'develop-multitype-buffer' into develop-unified-obs 4 年前
Ervin Teng 7a0ebfbd Pretty broken 4 年前
Ervin Teng 95bdbba3 Less broken PPO 4 年前
vincentpierre b863af57 Removing TensorFlow Trainers 4 年前
vincentpierre 713e65fb removing tensorflow testing for pytest and yamato 4 年前
vincentpierre 2dd34aa5 Formatting 4 年前
vincentpierre 735fcd52 [WIP] Refactor trainers to use list of obs rather than vec and vis obs 4 年前
vincentpierre 93ca1409 fixing the tests 4 年前
Ervin Teng 56dcd75a Get next critic observations into value estimate 4 年前
GitHub cc6b4564 Multi Directional Walker and Initial Hypernetwork (#4740) 4 年前
GitHub 22658a40 use sensor types to differentiate obs (#4749) 4 年前
Ervin Teng 330fc1d0 Merge branch 'master' into develop-centralizedcritic-mm 4 年前
Ervin Teng 6b8b3db3 Try subtract marginalized value 4 年前
Ervin Teng 2203fc0e Bootstrap if teammates not done 4 年前
Ervin Teng 092ea232 Some more progress - still broken 4 年前
Ervin Teng 457b2630 I think it's running 4 年前
Andrew Cohen 6e1826f8 might be right 4 年前
vincentpierre 52b011d6 _ 4 年前
Andrew Cohen feb38012 add lambda return and target network 4 年前
Andrew Cohen 79c658d2 remove normalize advantages 4 年前
Andrew Cohen a4c336c2 value estimator 4 年前
Andrew Cohen bd341f7f no target, increase lambda 4 年前
Andrew Cohen bdd73403 remove prints 4 年前
Andrew Cohen 8a5d291f use v return 4 年前
Andrew Cohen fce842aa adding zombie to coma2 brnch 4 年前
Andrew Cohen 7f491ae7 cloud run with coma2 of held out zombie test env 4 年前
Andrew Cohen 9af22d30 use only value funcs 4 年前
Andrew Cohen a3453c5d target of baseline is returns_v 4 年前
Andrew Cohen 511a9a7e no baseline 4 年前
Andrew Cohen 95253b47 ntegrate teammate dones 4 年前
Andrew Cohen 687f411b try again on cloud 4 年前
Ervin Teng 3aefac39 Use GAE again 4 年前
GitHub 64fc7f43 Buffer key enums (#4907) 4 年前
Ervin Teng adad5183 Weight decay, regularizaton loss 4 年前
Ervin Teng 4fe8d036 Try reduce bias 4 年前
Ervin Teng 6094613d try reduce bias more 4 年前
Andrew Cohen 74885bab add local reward to plot 4 年前
Andrew Cohen c08fefbc reduce initialization weights 4 年前
Ervin Teng a9116382 Bug fixes 4 年前
Andrew Cohen 98d647de MultiInputNetBody 4 年前
Ervin Teng ae7643b8 Proper critic memories for PPO 4 年前
Ervin Teng 97842f81 Fix non-lstm PPO 4 年前
Ervin Teng e46a86ad Merge branch 'master' into develop-superpush-int 4 年前
Ervin Teng 9bc88c41 Running COMA (not sure if learning) 4 年前
Ervin Teng 2f209c12 Buffer fixes 4 年前
Ervin Teng 61781a1a Merge branch 'main' into develop-agentprocessor-teammanager 4 年前
GitHub f16ce486 Update v2-staging from main (March 15) (#5123) 4 年前
GitHub 47db8ce1 [bug-fix] Fix padding for List entries in buffer (#5046) 4 年前
GitHub 62314056 Fix ghost curriculum and make steps private (#5098) 4 年前
Ervin Teng d1c24251 [bug-fix] When agent isn't training, don't clear update buffer (#5205) 4 年前