Off-policy temporal difference learning with distribution adaptation in fast mixing chains

Off-policy temporal difference learning with distribution adaptation in fast mixing chains
-

Article type
Journal
https://people.iut.ac.ir/en/palhang/content/1615749