主讲人:秦语真(荷兰奈梅亨拉德堡德大学,助理教授)
时间:2026年5月12日上午10:00—11:00 地点:数学院南楼N205
【报告摘要】Many sequential decision-making problems—from clinical trials to adaptive stimulation—involve rewards that materialize only after long delays and environments whose statistical structure shifts over time. I present our work addressing both challenges. First, I introduce contextual bandits with long-horizon rewards, showing how to achieve near-optimal regret despite temporal credit-assignment difficulties. Second, I discuss non-stationary representation learning in linear bandits, where the learner must discover and track a changing low-dimensional feature mapping. I present algorithms that adapt to the rate of non-stationarity and provably outperform static approaches. I emphasize connections to neuroscience applications and discuss open directions including sparse reward dependence in linear MDPs.