An Instrumental Variable Approach to Confounded Off-Policy Evaluation

主讲人:Prof. Chengchun Shi(伦敦政治经济学院)
时间:2023年9月2日11:00—12:00   地点:N602

【摘要】Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.

 

【个人介绍】Chengchun Shi is an Associate Professor at London School of Eco- nomics and Political Science. He is serving as the associate editors of JRSSB, JASA (T&M) and Journal of Nonparametric Statistics. His research focuses on developing statistical learning methods in rein- forcement learning, with applications to healthcare, ridesharing, video-sharing and neuroimaging. He was the recipient of the Royal Statistical Society Research Prize in 2021. He also received the IMS travel awards in three years.