55 次代码提交 (92f1315e-c793-432a-b899-36152b4bf137)

作者 SHA1 备注 提交日期
GitHub 2fd305e7 Move add_experiences out of trainer, add Trajectories (#3067) 5 年前
GitHub 0b5b1b01 Develop magic string + trajectory (#3122) 5 年前
GitHub f058b18c Replace BrainInfos with BatchedStepResult (#3207) 5 年前
GitHub 4641038e Renaming max_step to interrupted in TermialStep(s) (#3908) 5 年前
GitHub b853e5ba Action buffer (#4612) 4 年前
GitHub 3c96a3a2 Action Model (#4580) 4 年前
GitHub 88d3ec3e Merge master into hybrid actions staging branch (#4704) 4 年前
Andrew Cohen 3f771e61 add ActionBuffers and utils 4 年前
Andrew Cohen 653de147 fix AgentExperience typing 4 年前
Ervin Teng 7a0ebfbd Pretty broken 4 年前
Ervin Teng 15c463cf Add collab obs to trajectory 4 年前
Ervin Teng f479ce83 Fix bug; add critic_obs to buffer 4 年前
Andrew Cohen bd917c9c action buffer passes continuous 4 年前
Andrew Cohen b36fcf16 discrete runs/cont passes 4 年前
Ervin Teng fdaa8c3d Merge branch 'develop-unified-obs' into develop-centralizedcritic 4 年前
vincentpierre 735fcd52 [WIP] Refactor trainers to use list of obs rather than vec and vis obs 4 年前
vincentpierre 93ca1409 fixing the tests 4 年前
vincentpierre 12619155 added some docstrings 4 年前
Ervin Teng 56dcd75a Get next critic observations into value estimate 4 年前
vincentpierre c1587bce Solving merge conflicts 4 年前
Andrew Cohen 8172b3d6 test_simple_rl/reward providers pass tf/torch 4 年前
GitHub cc6b4564 Multi Directional Walker and Initial Hypernetwork (#4740) 4 年前
Ervin Teng 25dfd883 Merge branch 'master' into develop-centralizedcritic 4 年前
Andrew Cohen cd73cce2 test_trajectory fixed 4 年前
GitHub 22658a40 use sensor types to differentiate obs (#4749) 4 年前
vincentpierre 0c81006d addressing comments 4 年前
Andrew Cohen 5ec3fb98 fix action mask in trajectory 4 年前
Andrew Cohen e81e68de comms agent and fixed hallway 4 年前
Andrew Cohen 8071beb6 remove unused line in traj 4 年前
Andrew Cohen ca5a5194 soccer comms on the cloud 4 年前
Andrew Cohen 12828bdc remove tau from diff for 4 年前
Ervin Teng 330fc1d0 Merge branch 'master' into develop-centralizedcritic-mm 4 年前
Ervin Teng 9c3da1b6 New buffer layout, TeamObsUtil, pad dead agents 4 年前
Ervin Teng eab7e42a Use NaNs to get masks for attention 4 年前
Ervin Teng fdf97d99 Add team reward to buffer 4 年前
Ervin Teng 92fc78a5 Use new trajectory 4 年前
Ervin Teng 65b866b0 Actions added but untested 4 年前
Ervin Teng 0919a32d Add next action and next team obs 4 年前
Andrew Cohen 3a4aa513 COMAA runs 4 年前
Andrew Cohen feb38012 add lambda return and target network 4 年前
Chris Elion dbf1c946 WIP 4 年前
Andrew Cohen 45dd7401 move from average to sum of rewards 4 年前
GitHub 64fc7f43 Buffer key enums (#4907) 4 年前
Ervin Teng b21094f1 Use reward sum 4 年前
Ervin Teng eb13a14a Renaming fest 4 年前
Ervin Teng a6b4917a Use NamedTuples instead of attrs classes 4 年前
Ervin Teng a81512c9 Test for group and add team reward 4 年前
Chris Elion e4f51ca7 Merge remote-tracking branch 'origin/master' into MLA-1734-demo-provider 4 年前
Ervin Teng d4438878 Merge branch 'develop-base-teammanager' into develop-agentprocessor-teammanager 4 年前
Ervin Teng e46a86ad Merge branch 'master' into develop-superpush-int 4 年前
Ervin Teng be45d8c0 Move padding method to AgentBufferField 4 年前
GitHub d36a5242 Python Dataflow for Group Manager (#4926) 4 年前
GitHub f16ce486 Update v2-staging from main (March 15) (#5123) 4 年前
Ervin Teng a9fb37aa Fix reporting of group rewards, CLI print of group 4 年前
GitHub b9cab453 [perf] Optimizations for performance (#5192) 4 年前