to meaningful events: losing a life (2,8,10,21), narrowly escaping an enemy (3,5,6,11,12,13,14,15), passing...the smaller spikes correspond to relatively rare events that the agent has nevertheless experienced multiple...Univ. of Edinburgh OpenAI
ABSTRACT
We introduce an exploration bonus for deep reinforcement learning methods...Comparison of Figures 7 and 9 shows that across multiple games the RNN policy outperforms the CNN more...frequently than the other way around.
4
Other methods of exploration include adversarial self-play (