Ervin Teng
|
dc43b0c6
|
Add test for NN policy
|
5 年前 |
Ervin Teng
|
2eda5575
|
Fix discrete scoping
|
5 年前 |
Ervin Teng
|
cdd57468
|
Re-fix scoping and add method to get all variables
|
5 年前 |
Ervin Teng
|
1f094da9
|
Fix policy's scoping
|
5 年前 |
GitHub
|
dd86e879
|
Separate out optimizer creation and policy graph creation (#3355)
|
5 年前 |
Ervin Teng
|
85249afc
|
Fix SAC scoping
|
5 年前 |
Ervin Teng
|
aec5fcc0
|
Fix policy tests
|
5 年前 |
Ervin Teng
|
cadf6603
|
Fix SAC CC and some reward signal tests
|
5 年前 |
Ervin Teng
|
78671383
|
Move initialization call around
|
5 年前 |
Ervin Teng
|
a6e28cf4
|
Fix for visual obs
|
5 年前 |
Ervin Teng
|
cfc2f455
|
Fix BC and tests
|
5 年前 |
Ervin Teng
|
4871f49c
|
Fix comments for PPO
|
5 年前 |
Ervin Teng
|
7d616651
|
Add burn-in for memory PPO
|
5 年前 |
Ervin Teng
|
08cb91de
|
Remove __init__ for LearningModel static class
|
5 年前 |
Ervin Teng
|
ab9b082a
|
Fix Hallway summary freq
|
5 年前 |
Ervin Teng
|
9b0b2fed
|
Reduce memory sizes
|
5 年前 |
Ervin Teng
|
5ec49542
|
SAC LSTM isn't broken
|
5 年前 |
Ervin Teng
|
7f53bf8b
|
Cleanup LSTM code
|
5 年前 |
Ervin Teng
|
9b7499a0
|
Revert learn.py
|
5 年前 |
Ervin Teng
|
edeceefd
|
Zeroed version of LSTM working for PPO
|
5 年前 |
Ervin Teng
|
8e300036
|
Add some typing to optimizer
|
5 年前 |
Ervin Teng
|
a5caf4d6
|
Remove epsilon from everywhere
|
5 年前 |
Ervin Teng
|
1b6e175c
|
Fix discrete SAC and clean up policy
|
5 年前 |
Ervin Teng
|
6bbcf2d7
|
Add typing to value head creator
|
5 年前 |
Ervin Teng
|
b21b3d5c
|
Use resamp policy for SAC
|
5 年前 |
Ervin Teng
|
28f7608f
|
Clean up value head creation
|
5 年前 |
Ervin Teng
|
db249ceb
|
Merge branch 'master' into develop-splitpolicyoptimizer
|
5 年前 |
Ervin Teng
|
0ef40c08
|
SAC CC working
|
5 年前 |
Ervin Teng
|
d9fe2f9c
|
Unified policy
|
5 年前 |
Ervin Teng
|
b61d2fa1
|
Fix some typing issues with curiosity
|
5 年前 |
Ervin Teng
|
151e3b1c
|
Move policy to common location, remove epsilon
|
5 年前 |
Ervin Teng
|
abc98c23
|
Change reward signal creation
|
5 年前 |
Ervin Teng
|
164732a9
|
Move optimizer creation to Trainer, fix some of the reward signals
|
5 年前 |
Ervin Teng
|
e912fa47
|
Simplify creation of optimizer, breaks multi-GPU
|
5 年前 |
Ervin Teng
|
6baaf980
|
Remove PPO model
|
5 年前 |
Ervin Teng
|
3348bcef
|
Commit init file
|
5 年前 |
Ervin Teng
|
9ad99eb6
|
Combined model and policy for PPO
|
5 年前 |
Ervin Teng
|
2b63415e
|
Clean up policy files
|
5 年前 |
Ervin Teng
|
17dc17e5
|
Discrete PPO working
|
5 年前 |
Ervin Teng
|
bc04f9dc
|
Working continuous updates
|
5 年前 |
Ervin Teng
|
76ad64d7
|
Some more bugfixes
|
5 年前 |
Ervin Teng
|
2373cae8
|
Move methods into common optimizer
|
5 年前 |
Ervin Teng
|
cd74e51b
|
More progress
|
5 年前 |
Ervin Teng
|
91ffde5f
|
More incremental steps to separation
|
5 年前 |
Ervin Teng
|
6688453b
|
Move some functionality to optimizer-black
|
5 年前 |
Ervin Teng
|
2c1ef594
|
Move some functionality to optimizer-black
|
5 年前 |
Ervin Teng
|
03c750a7
|
Move some functionality to optimizer
|
5 年前 |