132 次代码提交 (54c4eb43-8bfc-4e88-8cad-1b01aab4cd7a)

作者 SHA1 备注 提交日期
Arthur Juliani 6879bae4 Initial optimizer port 4 年前
Arthur Juliani 7c3bd376 Refactoring policy and optimizer 4 年前
Arthur Juliani 2e51260a Resolving a few bugs 4 年前
Arthur Juliani 947f0d32 Slightly closer to running model 4 年前
Arthur Juliani 3c82bf59 Training runs, but doesn’t actually work 4 年前
Arthur Juliani 8c6f4696 Fix a couple additional bugs 4 年前
Arthur Juliani 4a50444f Support discrete actions as well 4 年前
Arthur Juliani a11a79e4 Continuous and discrete now train 4 年前
Arthur Juliani a5b5b109 Mulkti-discrete now working 4 年前
Arthur Juliani 5f936990 Visual observations now train as well 4 年前
Arthur Juliani 82688e5c GRU in-progress and dynamic cnns 4 年前
Arthur Juliani 29223931 Fix for memories 4 年前
Arthur Juliani 1736559f Combine actor and critic classes. Initial export. 4 年前
Arthur Juliani ca887743 Support tf and pytorch alongside one another 4 年前
Arthur Juliani 9835d26c Prepare model for onnx export 4 年前
Arthur Juliani be7e55e1 Use LSTM and fix a few merge errors 4 年前
Arthur Juliani b7be7f04 Fix bug in probs calculation 4 年前
Arthur Juliani 3eef9d78 Optimize np -> tensor operations 4 年前
Ervin Teng 72180f9b Experiment with JIT compiler 4 年前
Arthur Juliani 9724c9ac Merge master 4 年前
GitHub 0d80d87a Fix for discrete actions (#4181) 4 年前
GitHub cde8bd29 Convert List[np.ndarray] to np.ndarray before using torch.as_tensor (#4183) 4 年前
GitHub 05a11c96 Develop add fire exp framework (#4213) 4 年前
GitHub a28e2767 Update add-fire to latest master, including Policy refactor (#4263) 4 年前
Ruo-Ping Dong 6feec58a add Saver class (only TF working) 4 年前
GitHub 3a982317 [add-fire] Add learning rate and beta/epsilon decay to PyTorch (#4318) 4 年前
GitHub 7ddfd81f Added Reward Providers for Torch (#4280) 4 年前
Ruo-Ping Dong 71fe4df6 fix formatting and test 4 年前
Ruo-Ping Dong d3eb6c46 Merge branch 'develop-add-fire' into develop-add-fire-checkpoint 4 年前
Ervin Teng eaa59cf4 Use loss masks in PPO. 4 年前
Ervin Teng a48a0af4 Proper shape of masks 4 年前
GitHub f374f87a [add-fire] Add LSTM to SAC, LSTM fixes and initializations (#4324) 4 年前
Ervin Teng 1d4bc99e Proper mask mean for PPO 4 年前
Ruo-Ping Dong 59cc1a9f Merge branch 'develop-add-fire' into develop-add-fire-checkpoint 4 年前
Ervin Teng f8b40b9b Don't flatten when there are multiple continuous actions 4 年前
GitHub 6de31a03 [add-fire] Fix masked mean for 2d tensors (#4364) 4 年前
vincentpierre 9f51ab14 Saving the reward providers 4 年前
vincentpierre 108fac9a Replace torch.detach().cpu().numpy() with a utils method 4 年前
vincentpierre 31750e97 Using item() in place of to_numpy() 4 年前
GitHub 498934f9 Replace torch.detach().cpu().numpy() with a utils method (#4406) 4 年前
Ruo-Ping Dong f5dee9d1 jit for continuous control 4 年前
GitHub 4e93cb6e [torch] Restructure PyTorch encoders (#4421) 4 年前
GitHub 6f534366 Add torch_utils class, auto-detect CUDA availability (#4403) 4 年前
Ruo-Ping Dong fb50b0ec add wb 4 年前
Ervin Teng 3e771cbb Permute visual obs outside of network 4 年前
Ervin Teng 77c810fb Fix SAC and make utility method 4 年前
vincentpierre 181bdec0 - 4 年前
Andrew Cohen 643c8e58 ppo extended 4 年前
Andrew Cohen 44c9879e action models 4 年前
Ervin Teng e8431a6d Proper dimensions for entropy, sum before bonus in PPO 4 年前
Ervin Teng be159ad3 Make entropy reporting same as TF 4 年前
Andrew Cohen eaecb59e torch utils to and from buffer 4 年前
GitHub e0ef30a5 [bug-fix] Change entropy computation and loss reporting in Torch to match TF (#4538) 4 年前
vincentpierre d3d4eb90 Trainer with attention 4 年前
vincentpierre 7ef3c9a1 Trainer with attention 4 年前
GitHub b853e5ba Action buffer (#4612) 4 年前
GitHub 3c96a3a2 Action Model (#4580) 4 年前
GitHub 85a7c0f7 [bug-fix] Add clipping to PyTorch policy, fix initialization (#4649) 4 年前
Ervin Teng 2be74856 Double policy loss for no reason 4 年前
Andrew Cohen 3f771e61 add ActionBuffers and utils 4 年前
Ervin Teng 7a0ebfbd Pretty broken 4 年前
Ervin Teng 6c77ac7a Update SAC, fix PPO batching 4 年前
Andrew Cohen bd917c9c action buffer passes continuous 4 年前
Andrew Cohen ad951493 debugging discrete 4 年前
Andrew Cohen fcf6471e 2d discrete passes 4 年前
Andrew Cohen 056630d7 sac continuous and discrete train 4 年前
vincentpierre 735fcd52 [WIP] Refactor trainers to use list of obs rather than vec and vis obs 4 年前
Ervin Teng 6846af21 Multi-input network 4 年前
Ervin Teng 56dcd75a Get next critic observations into value estimate 4 年前
vincentpierre c1587bce Solving merge conflicts 4 年前
GitHub cc6b4564 Multi Directional Walker and Initial Hypernetwork (#4740) 4 年前
Ervin Teng 25dfd883 Merge branch 'master' into develop-centralizedcritic 4 年前
GitHub 22658a40 use sensor types to differentiate obs (#4749) 4 年前
Andrew Cohen 498b1ee6 Merge branch 'develop-action-buffer' into develop-hybrid-actions-singleton 4 年前
Andrew Cohen e81e68de comms agent and fixed hallway 4 年前
vincentpierre 44ed3258 Merging master 4 年前
Andrew Cohen ca5a5194 soccer comms on the cloud 4 年前
vincentpierre 449712b0 renaming sensor_spec to sensor_specS 4 年前
Andrew Cohen c843e3d4 hallway collab exps on cloud 4 年前
Andrew Cohen a20287f7 continuous comms 4 年前
Andrew Cohen 14ea0ad2 comment out comms in ppo optimizer 4 年前
Andrew Cohen f57875e0 layer norm 4 年前
Andrew Cohen bc77c990 layer norm and weight decay with fixed architecture 4 年前
Ervin Teng 330fc1d0 Merge branch 'master' into develop-centralizedcritic-mm 4 年前
Andrew Cohen 96c01a63 custom layer norm 4 年前
GitHub 14129a08 [MLA-470] Barracuda + TF cleanup (#4837) 4 年前
Andrew Cohen 1bc2ff96 add weight decay to trainers 4 年前
Arthur Juliani 0b4b0992 Rename more files 4 年前
Ervin Teng aba633b2 Merge branch 'develop-attention-refactor' into develop-centralizedcritic-mm 4 年前
Ervin Teng 9c3da1b6 New buffer layout, TeamObsUtil, pad dead agents 4 年前
GitHub 67ad9651 Merge pull request #4825 from Unity-Technologies/sensor-types 4 年前
Ervin Teng 6b8b3db3 Try subtract marginalized value 4 年前
Ervin Teng 457b2630 I think it's running 4 年前
Ervin Teng 3e481f7d Fix issue with team_actions 4 年前
Ervin Teng 0919a32d Add next action and next team obs 4 年前
Andrew Cohen 6e1826f8 might be right 4 年前
vincentpierre 52b011d6 _ 4 年前
vincentpierre 5f9ea5ea _ 4 年前
Andrew Cohen feb38012 add lambda return and target network 4 年前
Andrew Cohen 5741f8f6 no target net 4 年前
Andrew Cohen a92baab6 add target network back 4 年前
Andrew Cohen a4c336c2 value estimator 4 年前
vincentpierre 115e944b adding weight decay for experimentation 4 年前
Andrew Cohen d1285626 add target net 4 年前
Andrew Cohen bd341f7f no target, increase lambda 4 年前
Andrew Cohen fce842aa adding zombie to coma2 brnch 4 年前
Andrew Cohen 7f491ae7 cloud run with coma2 of held out zombie test env 4 年前
Andrew Cohen 9af22d30 use only value funcs 4 年前
Andrew Cohen e3239529 remove target update 4 年前
Andrew Cohen 2c3147b9 add value clipping 4 年前
Andrew Cohen 687f411b try again on cloud 4 年前
Ervin Teng a4eaebcb Add trust region to COMA updates 4 年前
Ervin Teng 3283b6a1 Remove Q-net for perf 4 年前
GitHub 64fc7f43 Buffer key enums (#4907) 4 年前
Ervin Teng adad5183 Weight decay, regularizaton loss 4 年前
Andrew Cohen 39592650 remove clipping 4 年前
Ervin Teng 2be83146 Use same network 4 年前
Ervin Teng ac4dc336 Remove reg loss, still stable 4 年前
Ervin Teng 64b34759 Black format 4 年前
Ervin Teng b6f88d6d Merge branch 'develop-base-teammanager' into develop-agentprocessor-teammanager 4 年前
Andrew Cohen 6bd396ee add critic to optimizer, ppo runs 4 年前
Andrew Cohen 3aec18a1 fix precommit errors 4 年前
Andrew Cohen 8efdeeb0 make critic a property 4 年前
Andrew Cohen c74dca9f add SharedActorCritic 4 年前
Ervin Teng ae7643b8 Proper critic memories for PPO 4 年前
Ervin Teng fd3f05b9 Enable GAIL to decay 4 年前
Ervin Teng e46a86ad Merge branch 'master' into develop-superpush-int 4 年前
GitHub 338af2ec Move the Critic into the Optimizer (#4939) 4 年前
GitHub f16ce486 Update v2-staging from main (March 15) (#5123) 4 年前
GitHub fc5d0a3f [bug-fix] Fix save/restore critic, add test (#5062) 4 年前
Ervin Teng a9ca7b3b Do burn-in for PPO 4 年前
vincentpierre 5d384292 forgot one 3 年前