





This should increase while the agent is learning, and then decrease once the 



reward stabilizes. 







 `Losses/Forward Loss` (PPO/SAC+Curiosity)  The mean magnitude of the inverse 



 `Losses/Forward Loss` (PPO/SAC+Curiosity)  The mean magnitude of the forward 



 `Losses/Inverse Loss` (PPO/SAC+Curiosity)  The mean magnitude of the forward 



 `Losses/Inverse Loss` (PPO/SAC+Curiosity)  The mean magnitude of the inverse 



model loss function. Corresponds to how well the model is able to predict the 



action taken between two observations. 






