Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC (2017)
Attributed to:
Autonomous behaviour and learning in an uncertain world
funded by
EPSRC
Abstract
No abstract provided
Bibliographic Information
Publication URI: https://openreview.net/forum?id=SJ3rcZcxl
Type: Conference/Paper/Proceeding/Abstract