GitHub
|
ba2af269
|
[coma2] Make group extrinsic reward part of extrinsic (#5033)
* Make group extrinsic part of extrinsic
* Fix test and init
* Fix tests and bug
* Add baseline loss to TensorBoard
|
4 年前 |
Andrew Cohen
|
4c56e6ad
|
lstm runs with coma
|
4 年前 |
Andrew Cohen
|
81524ee8
|
lstm almost runs
|
4 年前 |
Andrew Cohen
|
8f799687
|
ignoring precommit, grabbing baseline/critic mems from buffer in trainer
|
4 年前 |
Andrew Cohen
|
67beef88
|
finished evaluate_by_seq, does not run
|
4 年前 |
Andrew Cohen
|
131fa328
|
inital evaluate_by_seq, does not run
|
4 年前 |
Ervin Teng
|
fd0dd35c
|
Merge branch 'main' into develop-coma2-trainer
|
4 年前 |
Andrew Cohen
|
43955c5b
|
get value estimate test
|
4 年前 |
GitHub
|
c9c7e3d0
|
Faster NaN masking, fix masking for visual obs (#5015)
* Fix get mask from visual obs, large obs perf imp.
* Bug fix
* Fix typo
|
4 年前 |
Andrew Cohen
|
8562471e
|
add inital coma optimizer tests
|
4 年前 |
Andrew Cohen
|
e2d46ca0
|
Merge branch 'develop-agentprocessor-teammanager' into develop-coma2-trainer
|
4 年前 |
Andrew Cohen
|
5d517c5e
|
clean ups
|
4 年前 |
Ervin Teng
|
bc3d3a95
|
Fix slicing typing and string printing in AgentBufferField
|
4 年前 |
Andrew Cohen
|
9060da06
|
Merge branch 'develop-agentprocessor-teammanager' into develop-coma2-trainer
|
4 年前 |
Andrew Cohen
|
4b58527c
|
checkout ppo/optimizer from main
|
4 年前 |
Andrew Cohen
|
e37c5a98
|
Merge branch 'master' into develop-coma2-trainer
|
4 年前 |
GitHub
|
67e945f0
|
clean ups (#5003)
|
4 年前 |
Ervin Teng
|
4da2e22e
|
Fix Team Cumulative Reward
|
4 年前 |
Ervin Teng
|
4b159789
|
Add PushBlockCollab config and fix some stuff
|
4 年前 |
Ervin Teng
|
c6904f86
|
Group reward function
|
4 年前 |
Ervin Teng
|
b3958a8d
|
Buffer fixes
|
4 年前 |
Ervin Teng
|
a4fcbb63
|
Right loss function for stability, fix some pypi
|
4 年前 |
Ervin Teng
|
9bc88c41
|
Running COMA (not sure if learning)
|
4 年前 |
Ervin Teng
|
08db7c2f
|
Merge branch 'develop-agentprocessor-teammanager' into develop-coma2-trainer-mm
|
4 年前 |
Andrew Cohen
|
98d647de
|
MultiInputNetBody
|
4 年前 |
Andrew Cohen
|
418cc778
|
coma trainer and optimizer
|
4 年前 |
Andrew Cohen
|
3f7d68b8
|
fix test policy
|
4 年前 |
Andrew Cohen
|
00b891df
|
fix sac shared
|
4 年前 |
Andrew Cohen
|
d81d0be3
|
fix agent processor test
|
4 年前 |
Andrew Cohen
|
66742dc8
|
test for SharedActorCritic
|
4 年前 |
Andrew Cohen
|
c74dca9f
|
add SharedActorCritic
|
4 年前 |
Ervin Teng
|
24ee4bd5
|
Merge remote-tracking branch 'origin/develop-critic-optimizer' into develop-critic-optimizer
|
4 年前 |
Andrew Cohen
|
6828713c
|
fix saver test
|
4 年前 |
Andrew Cohen
|
9b92f5fb
|
remove commented code
|
4 年前 |
Ervin Teng
|
c675393c
|
Move value network for SAC to device
|
4 年前 |
Andrew Cohen
|
8efdeeb0
|
make critic a property
|
4 年前 |
Ervin Teng
|
1831044a
|
Update SAC to use separate policy
|
4 年前 |
Andrew Cohen
|
543f22bc
|
fix test_networks
|
4 年前 |
Andrew Cohen
|
3aec18a1
|
fix precommit errors
|
4 年前 |
Andrew Cohen
|
6bd396ee
|
add critic to optimizer, ppo runs
|
4 年前 |
Andrew Cohen
|
f73b9dba
|
update policy to not use critic
|
4 年前 |
Andrew Cohen
|
eeabb974
|
Separate Actor/Critic, remove ActorCritics
|
4 年前 |