* test initalize steps to 100
* use mean of first trajectory to initialize the normalizer
* remove blank line
* update changelog
* cleaned up initialization of variance/mean
* large normalization obs unit test
* add --upgrade to pip to get newer downloader (#4338)
* Fix format of the changelog for validation. (#4340)
Co-authored-by: Chris Elion <chris.elion@unity3d.com>
Co-authored-by: Chris Goy <christopherg@unity3d.com>
* Begin porting work
* Add ResNet and distributions
* Dynamically construct actor and critic
* Initial optimizer port
* Refactoring policy and optimizer
* Resolving a few bugs
* Share more code between tf and torch policies
* Slightly closer to running model
* Training runs, but doesn’t actually work
* Fix a couple additional bugs
* Add conditional sigma for distribution
* Fix normalization
* Support discrete actions as well
* Continuous and discrete now train
* Mulkti-discrete now working
* Visual observations now train as well
* GRU in-progress and dynamic cnns
* Fix for memories
* Remove unused arg
* Combine actor and critic classes. Initial export.
* Support tf and pytorch alongside one another
* Prepare model for onnx export
* Use LSTM and fix a few merge errors
* Fix bug in probs calculation
* Optimize np -> tensor operations
* Time action sample funct...
* Moved components to the tf folder and moved the TrainerFactory to the `trainer` folder
* Addressing comments
* Editing the migrating doc
* fixing test
* use int64 steps
* check for NaN actions
Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com>
Co-authored-by: Chris Elion <chris.elion@unity3d.com>