Synthetic Data for Statistical Learning: Prediction, Inference, and Fidelity

主讲人:田新宇(美国明尼苏达大学统计系,博士后)
时间:2026年5月18日上午10:00—11:00   地点:数学院南楼N202

【报告摘要】Synthetic data is becoming an increasingly important tool for statistical learning, with the potential to expand limited samples, improve learning in sparse regions, and support prediction and inference when repeated observations are scarce. However, its statistical value depends critically on the quality of the underlying generative model: simply generating more samples cannot remove bias if the generator fails to capture the relevant target distribution.

This talk presents a line of work on using synthetic data as a statistical instrument rather than as a generic data augmentation device. The discussion focuses on three roles of synthetic data in statistical learning: adaptive data augmentation, distributional prediction, and generative inference. It also highlights two complementary directions for strengthening this framework: improving generative fidelity through transfer learning from source domains, and developing efficient generative models and sampling methods that make synthetic-data-based procedures computationally practical. Together, these results suggest a broader research agenda of building reliable and efficient synthetic data engines for prediction, inference, and scientific discovery.

【报告人简介】田新宇博士,2023年毕业于中国科学院数学与系统科学研究院(师从石坚研究员),此后至今在美国明尼苏达大学统计系从事博士后研究。主要研究方向包括生成模型的理论与应用、可靠性分析及体育数据建模等,多篇论文已发表或接收于JASA、AOAS、JMLR、STAT、Statistica Sinica等统计与机器学习领域的权威期刊。