Off-policy temporal difference learning with distribution adaptation in fast mixing chains

Off-policy temporal difference learning with distribution adaptation in fast mixing chains
-

Date : 1395-10
Article type
Journal