107 次代码提交 (306f0927-d76a-4ab1-a94f-4e7e156242ba)

作者 SHA1 备注 提交日期
GitHub fbf92810 Refactor Trainers to use Policy (#1098) 6 年前
GitHub 10d2a19d Release v0.5 (Develop) (#1203) 6 年前
GitHub ab5c49e8 Release v0.5 delete unityagents (#1151) 6 年前
GitHub d2c320dd Remove graph scope (#1205) 6 年前
GitHub 6c354d16 New Learning Brain (#1303) 6 年前
vincentpierre 1045b6e7 Fix continuous curriosity 6 年前
eshvk ef8009d9 Python code reformat via [`black`](https://github.com/ambv/black). 6 年前
GitHub 4ac79742 Refactor reward signals into separate class (#2144) 5 年前
Jonathan Harper 177ee5b8 Remove unused "last reward" logic, TF nodes 5 年前
GitHub b05c9ac1 Add environment manager for parallel environments (#2209) 5 年前
GitHub 84d9d622 python timers (#2180) 5 年前
GitHub 9c50abcf GAIL and Pretraining (#2118) 5 年前
GitHub a5b7cf95 Fix get_value_estimate and buffer append (#2276) 5 年前
Chris Elion dfdf7b83 fix whitespace and line breaks 5 年前
GitHub be4292fb Add different types of visual encoder (nature cnn/resnet) 5 年前
GitHub 6a212f73 Improvements for GAIL (#2296) 5 年前
GitHub 6225317d refactor vis_encoder_type and add to doc 5 年前
GitHub a9fe719c Add Multi-GPU implementation for PPO (#2288) 5 年前
GitHub d7ebaae1 Return list instead of np array for make_mini_batch() (#2371) 5 年前
GitHub 7b69bd14 Refactor Trainer and Model (#2360) 5 年前
GitHub bd7eb286 Update reward signals in parallel with policy (#2362) 5 年前
GitHub 3683cc1c Enable learning rate decay to be disabled (#2567) 5 年前
GitHub 832e4a47 Normalize observations when adding experiences (#2556) 5 年前
GitHub 67d754c5 Fix flake8 import warnings (#2584) 5 年前
GitHub cb144f20 small mypy cleanup (#2637) 5 年前
Chris Elion 43e23941 rough pass at tf2 support, needs cleanup 5 年前
Chris Elion 806c77e4 centralize tensorflow imports 5 年前
Ervin Teng 12a1e306 start on tf2 policy 5 年前
Ervin Teng e185844f Start on TF 2 policy 5 年前
GitHub 0fe5adc2 Develop remove memories (#2795) 5 年前
Chris Elion 691d21e6 Merge remote-tracking branch 'origin/develop' into try-tf2-support 5 年前
Chris Elion 73a346cb cleanup 5 年前
Ervin Teng 987e0e3a Merge tf2 branch 5 年前
Ervin Teng 748c250e Somewhat running 5 年前
Ervin Teng 9dbbfd77 Somewhat running 5 年前
Ervin Teng 5e6de46f Add normalizer 5 年前
Ervin Teng 5e1c1a00 Tweaks to Policy 5 年前
Ervin Teng a665daed It's mostly training 5 年前
Ervin Teng 3eb1e9c2 Pytorch port of continuous PPO 5 年前
Ervin Teng d46b60b3 Add ReLU to the dense 5 年前
Ervin Teng ed2c35b9 Remove some comments 5 年前
Ervin Teng 135a5bb4 Add dummy save methods 5 年前
GitHub 69d1a033 Develop remove past action communication (#2913) 5 年前
Ervin Teng 437c6c2f Add dummy save methods 5 年前
Ervin Teng d983a636 Speed up a bit faster 5 年前
Ervin Teng 3a4fa244 Switch to tanh squash in PPO 5 年前
GitHub 681093cf cherry pick PR#3032 (#3066) 5 年前
Ervin Teng 9e661f0c Looks like it's training 5 年前
Ervin Teng eb4a04a5 Merge branch 'master' into develop-tanhsquash 5 年前
GitHub 3b4b0d55 Remove random normal epsilon (#3039) 5 年前
Ervin Teng f94365a2 No longer using ProcessingBuffer for PPO 5 年前
Ervin Teng 8b3b9e6c Move trajectory and related functions to trajectory.py 5 年前
Ervin Teng 88b1123a Merge branch 'master' of github.com:Unity-Technologies/ml-agents into develop-agentprocessor 5 年前
GitHub 36048cb6 Moving Env Manager to Trainers (#3062) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. 5 年前
Ervin Teng c7632aa7 Fix some bugs for visual obs 5 年前
GitHub 1fa07edb Remove Standalone Offline BC Training (#2969) 5 年前
Ervin Teng 5ab2563b Fixes for recurrent 5 年前
Chris Elion fdc810ff move (first pass) 5 年前
Ervin Teng 27c2a55b Lots of test fixes 5 年前
Ervin Teng 97d66e71 Remove BootstrapExperience 5 年前
Ervin Teng 324d217b Move agent_id to Trajectory 5 年前
Ervin Teng 77ff4822 Add back next_obs 5 年前
Ervin Teng 2b811fc8 Properly report value estimates and episode length 5 年前
GitHub 2fd305e7 Move add_experiences out of trainer, add Trajectories (#3067) 5 年前
Ervin Teng c330f6f6 Merge branch 'master' into develop-agentprocessor 5 年前
Ervin Teng 1bd791e5 Merge branch 'master' into develop-agentprocessor 5 年前
GitHub 45010af3 Add stats reporter class and re-enable missing stats (#3076) 5 年前
GitHub f058b18c Replace BrainInfos with BatchedStepResult (#3207) 5 年前
Ervin Teng cd74e51b More progress 5 年前
Ervin Teng 2b63415e Clean up policy files 5 年前
Ervin Teng 9ad99eb6 Combined model and policy for PPO 5 年前
Ervin Teng e912fa47 Simplify creation of optimizer, breaks multi-GPU 5 年前
Ervin Teng 164732a9 Move optimizer creation to Trainer, fix some of the reward signals 5 年前
Ervin Teng 151e3b1c Move policy to common location, remove epsilon 5 年前
Ervin Teng d9fe2f9c Unified policy 5 年前
Ervin Teng 0ef40c08 SAC CC working 5 年前
Ervin Teng 1b6e175c Fix discrete SAC and clean up policy 5 年前
Ervin Teng edeceefd Zeroed version of LSTM working for PPO 5 年前
Ervin Teng 649c4185 Zero out memory 5 年前
Ervin Teng 7f53bf8b Cleanup LSTM code 5 年前
Ervin Teng 4871f49c Fix comments for PPO 5 年前
Ervin Teng cfc2f455 Fix BC and tests 5 年前
Ervin Teng 78671383 Move initialization call around 5 年前
Ervin Teng cadf6603 Fix SAC CC and some reward signal tests 5 年前
GitHub dd86e879 Separate out optimizer creation and policy graph creation (#3355) 5 年前
Ervin Teng 1f094da9 Fix policy's scoping 5 年前
Ervin Teng cdd57468 Re-fix scoping and add method to get all variables 5 年前
Ervin Teng 2eda5575 Fix discrete scoping 5 年前
Ervin Teng 1407db53 Fix Barracuda export for LSTM 5 年前
Ervin Teng 328476d8 Move check for creation into nn_policy 5 年前
Ervin Teng 7d5c1b0b Add docstring and make some methods private 5 年前
Ervin Teng 441e6a0c Add typing to optimizer, rename self.tf_optimizer 5 年前
Ervin Teng ffdc41bb Removed floating constants 5 年前
Ervin Teng 8abd4129 Clean up nn_policy 5 年前
Ervin Teng 7c0fa1c4 Remove action_holder placeholder 5 年前
Ervin Teng c9fbb111 Fix entropy calculation 5 年前
Ervin Teng be9d772e Add option to not condition sigma on obs 5 年前
Ervin Teng 0ab7aa58 Fix tensor names 5 年前
Ervin Teng 1cfc461a Remove and rename tf_optimizer 5 年前
Ervin Teng 63463bd1 Make TF graph seed deterministic 5 年前
Ervin Teng 14f2a7f2 Rename LearningModel to ModelUtils 5 年前
Ervin Teng 1156b9b3 Merge branch 'develop-splitpolicyoptimizer' into develop-removeactionholder 5 年前
Ervin Teng d57124b4 Merge 'master' into develop-removeactionholder 5 年前
Ervin Teng d6eb262c Rename resample to reparameterize 5 年前
Ervin Teng 242e2421 Move encoder creation to separate function 5 年前
Ervin Teng 53c25fb1 Move one-hot out of policy and remove selected_actions 5 年前
Ervin Teng a73704bc Remove previous action from policy 5 年前