Sample-Efficient online decision-making: Bandits----系统科学研究所

Sample-Efficient online decision-making: Bandits

主讲人：秦语真（荷兰奈梅亨拉德堡德大学，助理教授）
时间：2026年5月12日上午10:00—11:00 地点：数学院南楼N205

【报告摘要】Many sequential decision-making problems—from clinical trials to adaptive stimulation—involve rewards that materialize only after long delays and environments whose statistical structure shifts over time. I present our work addressing both challenges. First, I introduce contextual bandits with long-horizon rewards, showing how to achieve near-optimal regret despite temporal credit-assignment difficulties. Second, I discuss non-stationary representation learning in linear bandits, where the learner must discover and track a changing low-dimensional feature mapping. I present algorithms that adapt to the rate of non-stationarity and provably outperform static approaches. I emphasize connections to neuroscience applications and discuss open directions including sparse reward dependence in linear MDPs.

【报告人简介】秦语真博士，现任荷兰奈梅亨拉德堡德大学(Radboud University)机器学习与神经计算系助理教授。他于2012年和2015年分别在河海大学和武汉大学获得自动化专业学士和硕士学位，于2019年获得荷兰格罗宁根大学系统控制专业博士学位。2020至2023年间，他于美国加州大学河滨担任博士后研究员。他的主要研究方向包括复杂网络控制、非线性系统控制、强化学习，以及控制理论和机器学习在神经调控、闭环BCI中的应用。