Ervin Teng
|
9b7499a0
|
Revert learn.py
|
5 年前 |
Ervin Teng
|
edeceefd
|
Zeroed version of LSTM working for PPO
|
5 年前 |
Ervin Teng
|
8e300036
|
Add some typing to optimizer
|
5 年前 |
Ervin Teng
|
a5caf4d6
|
Remove epsilon from everywhere
|
5 年前 |
Ervin Teng
|
1b6e175c
|
Fix discrete SAC and clean up policy
|
5 年前 |
Ervin Teng
|
6bbcf2d7
|
Add typing to value head creator
|
5 年前 |
Ervin Teng
|
b21b3d5c
|
Use resamp policy for SAC
|
5 年前 |
Ervin Teng
|
28f7608f
|
Clean up value head creation
|
5 年前 |
Ervin Teng
|
db249ceb
|
Merge branch 'master' into develop-splitpolicyoptimizer
|
5 年前 |
Ervin Teng
|
0ef40c08
|
SAC CC working
|
5 年前 |
Ervin Teng
|
d9fe2f9c
|
Unified policy
|
5 年前 |
Ervin Teng
|
b61d2fa1
|
Fix some typing issues with curiosity
|
5 年前 |
Ervin Teng
|
151e3b1c
|
Move policy to common location, remove epsilon
|
5 年前 |
Ervin Teng
|
abc98c23
|
Change reward signal creation
|
5 年前 |
Ervin Teng
|
164732a9
|
Move optimizer creation to Trainer, fix some of the reward signals
|
5 年前 |
Ervin Teng
|
e912fa47
|
Simplify creation of optimizer, breaks multi-GPU
|
5 年前 |
Ervin Teng
|
6baaf980
|
Remove PPO model
|
5 年前 |
Ervin Teng
|
3348bcef
|
Commit init file
|
5 年前 |
Ervin Teng
|
9ad99eb6
|
Combined model and policy for PPO
|
5 年前 |
Ervin Teng
|
2b63415e
|
Clean up policy files
|
5 年前 |
Ervin Teng
|
17dc17e5
|
Discrete PPO working
|
5 年前 |
Ervin Teng
|
bc04f9dc
|
Working continuous updates
|
5 年前 |
Ervin Teng
|
76ad64d7
|
Some more bugfixes
|
5 年前 |
Ervin Teng
|
2373cae8
|
Move methods into common optimizer
|
5 年前 |
Ervin Teng
|
cd74e51b
|
More progress
|
5 年前 |
Ervin Teng
|
91ffde5f
|
More incremental steps to separation
|
5 年前 |
Ervin Teng
|
6688453b
|
Move some functionality to optimizer-black
|
5 年前 |
Ervin Teng
|
2c1ef594
|
Move some functionality to optimizer-black
|
5 年前 |
Ervin Teng
|
03c750a7
|
Move some functionality to optimizer
|
5 年前 |