Off-policy temporal difference learning with distribution adaptation in fast mixing chains - doi 10.1007/s00500-017-2490-1 Date : 1395-10 Article type Journal