|
|
|
HyperparametersB�
Btrainer_typeBppoBhyperparametersB�{'batch_size': 2024, 'buffer_size': 20240, 'learning_rate': 0.0003, 'beta': 0.005, 'epsilon': 0.2, 'lambd': 0.95, 'num_epoch': 3, 'learning_rate_schedule': 'linear'}Bnetwork_settingsBf{'normalize': True, 'hidden_units': 512, 'num_layers': 3, 'vis_encode_type': 'simple', 'memory': None}Breward_signalsB0{'extrinsic': {'gamma': 0.995, 'strength': 1.0}}B init_pathBNoneBkeep_checkpointsB5Bcheckpoint_intervalB500000B max_stepsB10000000Btime_horizonB1000Bsummary_freqB30000BthreadedBTrueB self_playBNoneBbehavioral_cloningBNoneJ |