GitHub
8f35bdd3
POCA trainer ( #5005 )
Co-authored-by: Ervin Teng <ervin@unity3d.com>
Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com>
Co-authored-by: Chris Elion <chris.elion@unity3d.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
4 年前
GitHub
62314056
Fix ghost curriculum and make steps private ( #5098 )
* use get step to determine curriculum
* add to CHANGELOG
* Make step in trainer private (#5099 )
Co-authored-by: Ervin T <ervin@unity3d.com>
4 年前
Ervin Teng
54ffbed6
[cherry-pick] Fix ghost curriculum and make steps private ( #5098 )
* use get step to determine curriculum
* add to CHANGELOG
* Make step in trainer private (#5099 )
Co-authored-by: Ervin T <ervin@unity3d.com>
4 年前
Andrew Cohen
9176247c
Merge branch 'main' into develop-soccer-groupman-mod
4 年前
GitHub
e81e038b
Fix end episode for POCA, add warning for group reward if not POCA ( #5113 )
* Fix end episode for POCA, add warning for group reward if not POCA
* Add missing imports
4 年前
GitHub
e79d8a9d
[bug-fix] Move POCA critic to default device ( #5124 )
* Move critic to default device
* Make sure to clone onto default device
* Add some debug stuff
* Some more debug
* Fix issue
* Fix bool tensor too
4 年前
GitHub
63169e2c
[cherry-pick] Fix group rewards for POCA, add warning for non-POCA trainers ( #5120 )
* Fix end episode for POCA, add warning for group reward if not POCA (#5113 )
* Fix end episode for POCA, add warning for group reward if not POCA
* Add missing imports
* Use np.any, which is faster
4 年前
GitHub
e6143a83
[bug-fix] Move POCA critic to default device ( #5124 ) ( #5131 )
* Move critic to default device
* Make sure to clone onto default device
* Add some debug stuff
* Some more debug
* Fix issue
* Fix bool tensor too
4 年前
Ervin Teng
d1c24251
[bug-fix] When agent isn't training, don't clear update buffer ( #5205 )
* Don't clear update buffer, but don't append to it either
* Update changelog
* Address comments
* Make experience replay buffer saving more verbose
(cherry picked from commit 63e7ad44d96b7663b91f005ca1d88f4f3b11dd2a)
4 年前
Ervin Teng
c108da4a
[bug-fix] Fix POCA LSTM, pad sequences in the back ( #5206 )
* Pad buffer at the end
* Fix padding in optimizer value estimate
* Fix additional bugs and POCA
* Fix groupmate obs, add tests
* Update changelog
* Improve tests
* Address comments
* Fix poca test
* Fix buffer test
* Increase entropy for Hallway
* Add EOF newline
* Fix Behavior Name
* Address comments
(cherry picked from commit 2ce6810846ba9268e4fb5fb082fa54e90414c980)
4 年前
Andrew Cohen
18be47e8
Merge branch 'main' into develop-soccer-groupman-mod
4 年前
Ervin Teng
81b74634
Fix additional bugs and POCA
4 年前
Ervin Teng
c05ec9af
Fix groupmate obs, add tests
4 年前
Ervin Teng
9fd4a81e
Address comments
4 年前
GitHub
ff21216d
[bug-fix] When agent isn't training, don't clear update buffer ( #5205 )
* Don't clear update buffer, but don't append to it either
* Update changelog
* Address comments
* Make experience replay buffer saving more verbose
4 年前
GitHub
c5589b59
[bug-fix] Fix POCA LSTM, pad sequences in the back ( #5206 )
* Pad buffer at the end
* Fix padding in optimizer value estimate
* Fix additional bugs and POCA
* Fix groupmate obs, add tests
* Update changelog
* Improve tests
* Address comments
* Fix poca test
* Fix buffer test
* Increase entropy for Hallway
* Add EOF newline
* Fix Behavior Name
* Address comments
4 年前
vincentpierre
983982ee
Removing misleading learning rate
3 年前