新闻公告

首页 / 新闻公告 / 最新通知 /

新闻公告

重磅 | 第九届中国人民大学国际统计论坛主报告预告(四)

2023-06-26

“中国人民大学国际统计论坛”创办于2004年,致力于搭建统计学界高层次的学术交流平台,已成为中国最有影响力的统计学论坛之一。

2023年7月14日至15日,本届论坛将邀请5位主题报告人和6位特邀报告人,本次介绍主题报告人Xihong Lin,预祝第九届中国人民大学国际统计论坛取得圆满成功!

Xihong Lin

Title

Fast Distributed Principal Component Analysis of Large-Scale Federated Data

Abstract

Principal component analysis (PCA) is one of the most popular methods for dimension reduction. In the light of the rapidly growing large-scale data in federated ecosystems, the traditional PCA method is often not applicable due to privacy protection considerations and large computational burden. Algorithms were proposed to lower the computational cost, but few can handle both high dimensionality and massive sample size under the distributed setting. In this paper, we propose the FAst DIstributed (FADI) PCA method for federated data when both the dimension d and the sample size n are ultra-large, by simultaneously performing parallel computing along d and distributed computing along n. Specifically, we utilize L parallel copies of p-dimensional fast sketches to divide the computing burden along d and aggregate the results distributively along the split samples. We present FADI under a general framework applicable to multiple statistical problems, and establish comprehensive theoretical results under the general framework. We show that FADI enjoys the same non-asymptotic error rate as the traditional PCA when Lp ≥ d. We also derive inferential results that characterize the asymptotic distribution of FADI, and show a phase-transition phenomenon as Lp increases. We also discuss estimation of the number of low ranks of a covariance matrix by Bulk Eigenvalue Matching Analysis (BEMA). We perform extensive simulations to show that FADI substantially outperforms the existing methods in computational efficiency while preserving accuracy, and validate the distributional phase-transition phenomenon through numerical experiments. We apply FADI to the 1000 Genomes data to study the population structure. This is joint work with Shuting Shen and Junwei Lu.

Biography

Xihong Lin, PhD is Professor and former Chair of Biostatistics and Coordinating Director of the Program in Quantitative Genomics at the Harvard T. H. Chan School of Public Health, as well as Professor of Statistics at Harvard University. Dr. Lin’s research interests lie in the development and application of scalable statistical and machine learning methods for analysis of massive genetic and genomic data along with complex epidemiological, biobank and health data. Dr. Lin is an elected member of both the US National Academy of Sciences and the National Academy of Medicine. She received the 2002 Mortimer Spiegelman Award from the American Public Health Association, the 2006 Presidents’ Award and the 2017 FN David Award of the Committee of Presidents of Statistical Societies (COPSS), the 2022 Jerome Sacks Award for Outstanding Cross-Disciplinary Research from the National Institute of Statistical Science, and the 2022 Marvin Zelen Leadership in Statistical Science Award. She is an elected fellow of American Statistical Association, Institute of Mathematical Statistics, and International Statistical Institute.  She is a recipient of the MERIT Award (2007-2015) and the Outstanding Investigator Award (OIA) (R35) (2015-2029) from the National Cancer Institute (NCI). Dr. Lin is known for her contributions to epidemic modeling during the early phase of the COVID-19 pandemic. Dr. Lin has held the position of the Chair of the COPSS (2010-2012) and has been a member of the Committee of Applied and Theoretical Statistics of the National Academy of Sciences. Dr. Lin has served as the former Coordinating Editor of Biometrics and the founding co-editor of Statistics in Biosciences. She has contributed her expertise to numerous NIH and NSF review panels.