Ervin Teng
|
03c750a7
|
Move some functionality to optimizer
|
5 年前 |
Ervin Teng
|
2c1ef594
|
Move some functionality to optimizer-black
|
5 年前 |
Ervin Teng
|
91ffde5f
|
More incremental steps to separation
|
5 年前 |
Ervin Teng
|
cd74e51b
|
More progress
|
5 年前 |
Ervin Teng
|
2373cae8
|
Move methods into common optimizer
|
5 年前 |
Ervin Teng
|
76ad64d7
|
Some more bugfixes
|
5 年前 |
Ervin Teng
|
bc04f9dc
|
Working continuous updates
|
5 年前 |
Ervin Teng
|
17dc17e5
|
Discrete PPO working
|
5 年前 |
Ervin Teng
|
9ad99eb6
|
Combined model and policy for PPO
|
5 年前 |
Ervin Teng
|
e912fa47
|
Simplify creation of optimizer, breaks multi-GPU
|
5 年前 |
Ervin Teng
|
164732a9
|
Move optimizer creation to Trainer, fix some of the reward signals
|
5 年前 |
Ervin Teng
|
d9fe2f9c
|
Unified policy
|
5 年前 |
Ervin Teng
|
0ef40c08
|
SAC CC working
|
5 年前 |
Ervin Teng
|
28f7608f
|
Clean up value head creation
|
5 年前 |
Ervin Teng
|
edeceefd
|
Zeroed version of LSTM working for PPO
|
5 年前 |
Ervin Teng
|
7f53bf8b
|
Cleanup LSTM code
|
5 年前 |
Ervin Teng
|
5ec49542
|
SAC LSTM isn't broken
|
5 年前 |
Ervin Teng
|
7d616651
|
Add burn-in for memory PPO
|
5 年前 |
Ervin Teng
|
4871f49c
|
Fix comments for PPO
|
5 年前 |
Ervin Teng
|
78671383
|
Move initialization call around
|
5 年前 |
GitHub
|
dd86e879
|
Separate out optimizer creation and policy graph creation (#3355)
|
5 年前 |
Ervin Teng
|
dcbb90e1
|
Fix graph init in ghost trainer
|
5 年前 |
Ervin Teng
|
14720e2d
|
Remove burn-in
|
5 年前 |
Ervin Teng
|
328476d8
|
Move check for creation into nn_policy
|
5 年前 |
Ervin Teng
|
ce110201
|
Add optional burn-in for SAC as well
|
5 年前 |
Ervin Teng
|
cbfbff2c
|
Split optimizer and TFOptimizer
|
5 年前 |
Ervin Teng
|
4d94e180
|
Move optimizer to common folder
|
5 年前 |
Ervin Teng
|
441e6a0c
|
Add typing to optimizer, rename self.tf_optimizer
|
5 年前 |
Ervin Teng
|
ffdc41bb
|
Removed floating constants
|
5 年前 |
Ervin Teng
|
7c0fa1c4
|
Remove action_holder placeholder
|
5 年前 |
Ervin Teng
|
30e4424c
|
Fix PPO optimizer creation
|
5 年前 |
Ervin Teng
|
ff607162
|
Move learning rate reporting
|
5 年前 |
Ervin Teng
|
c735e722
|
Make create critic methods private
|
5 年前 |
Ervin Teng
|
da6daebd
|
Make create losses private
|
5 年前 |
Ervin Teng
|
14f2a7f2
|
Rename LearningModel to ModelUtils
|
5 年前 |
Ervin Teng
|
1156b9b3
|
Merge branch 'develop-splitpolicyoptimizer' into develop-removeactionholder
|
5 年前 |
Ervin Teng
|
53c25fb1
|
Move one-hot out of policy and remove selected_actions
|
5 年前 |