Sample-Efficient online decision-making: Bandits

主讲人:秦语真(荷兰奈梅亨拉德堡德大学,助理教授)
时间:2026年5月12日上午10:00—11:00   地点:数学院南楼N205

【报告摘要】Many sequential decision-making problems—from clinical trials to adaptive stimulation—involve rewards that materialize only after long delays and environments whose statistical structure shifts over time. I present our work addressing both challenges. First, I introduce contextual bandits with long-horizon rewards, showing how to achieve near-optimal regret despite temporal credit-assignment difficulties. Second, I discuss non-stationary representation learning in linear bandits, where the learner must discover and track a changing low-dimensional feature mapping. I present algorithms that adapt to the rate of non-stationarity and provably outperform static approaches. I emphasize connections to neuroscience applications and discuss open directions including sparse reward dependence in linear MDPs.

报告人简介】秦语真博士,现任荷兰奈梅亨拉德堡德大学(Radboud University)机器学习与神经计算系助理教授。他于2012年和2015年分别在河海大学和武汉大学获得自动化专业学士和硕士学位,于2019年获得荷兰格罗宁根大学系统控制专业博士学位。2020至2023年间,他于美国加州大学河滨担任博士后研究员。他的主要研究方向包括复杂网络控制、非线性系统控制、强化学习,以及控制理论和机器学习在神经调控、闭环BCI中的应用。