“统计大讲堂”第151讲预告:数据科学中的子抽样问题
2021-04-24
报告时间:2021年4月28日上午9:00-11:00
报告形式:腾讯会议
(会议ID:352 163 401)
报告嘉宾:马平
报告主题:数据科学中的子抽样问题
报告摘要:
The rapid advance in science and technology in the past decade brings an extraordinary amount of data that were inaccessible just a decade ago, offering researchers an unprecedented opportunity to tackle much larger and more complex research challenges. The opportunity, however, has not yet been fully utilized, because effective and efficient statistical and computing tools for analyzing super-large dataset are still lacking. One major challenge is that the advance of computing technologies still lags far behind the exponential growth of database. One option is to invent algorithms that make better use of a fixed amount of computing power.
In this talk, I will review an emerging family of subsampling methods that are developed for achieving such a goal. In subsampling methods, we sample a small proportion of the data (subsample) from the full sample, and then perform intended computations for the full sample using the small subsample as a surrogate. In classic statistical literature, subsampling has been used to refer to ‘m-out-of-n’ bootstrap, whose primary motivation is to make approximate inference owing to the difficulty or intractability in deriving analytical expressions. The general motivation of the subsampling methods in data science is different from the traditional subsampling. I will present challenges and opportunities.
马平教授是美国佐治亚大学的杰出教授和大数据分析实验室的共同主任,2003年在普渡大学获得博士学位,2003年至2005年在哈佛大学从事博士后研究。2005年至2013年在伊利诺伊大学香槟分校任助理和副教授。他是伊利诺伊大学高等研究中心贝克曼讲席教授,美国国家超级计算和应用中心讲席教授、美国国家科学基金会杰出青年科学家奖CAREER AWARD获得者。他的论文获得了2011年加拿大统计杂志的最佳论文奖。他是2021美国国家科学基金生物科技杰出讲座的讲座人。他是美国统计协会的会士。