新闻公告

首页 / 新闻公告 / 中心新闻 /

新闻公告

我中心研究员范新妍、李扬及学生就区域间PM2.5浓度关系网络估计问题在《JRSSC》发表论文

2022-06-26

我中心研究员范新妍、李扬及学生等合作完成论文“Heterogeneous graphical model for non-negative and non-Gaussian PM2.5 data”并在《Journal of the Royal Statistical Society Series C》发表。该研究针对PM2.5浓度数据非负、在不同月份存在异质性等挑战,对每个月份的浓度数据使用非负象限上的截断高斯分布建模,并利用得分匹配损失 (score matching loss) 以及 group fused lasso惩罚构建目标函数,进行不同月份不同区域PM2.5浓度图模型的联合估计,完成不同月份数据的聚类以及不同区域浓度条件相关关系的选择。

论文题目

Heterogeneous graphical model for non-negative and non-Gaussian PM2.5 data

文章摘要

Studies on the conditional relationships between PM2.5 concentrations among different regions are of great interest for the joint prevention and control of air pollution. Because of seasonal changes in atmospheric conditions, spatial patterns of PM2.5 may differ throughout the year. Additionally, concentration data are both non-negative and non-Gaussian. These data features pose significant challenges to existing methods. This study proposes a heterogeneous graphical model for non-negative and non-Gaussian data via the score matching loss. The proposed method simultaneously clusters multiple datasets and estimates a graph for variables with complex properties in each cluster. Furthermore, our model involves a network that indicate similarity among datasets, and this network can have additional applications. In simulation studies, the proposed method outperforms competing alternatives in both clustering and edge identification. We also analyse the PM2.5 concentrations' spatial correlations in Taiwan's regions using data obtained in year 2019 from 67 air-quality monitoring stations. The 12 months are clustered into four groups: January–March, April, May–September and October–December, and the corresponding graphs have 153, 57, 86 and 167 edges respectively. The results show obvious seasonality, which is consistent with the meteorological literature. Geographically, the PM2.5 concentrations of north and south Taiwan regions correlate more respectively. These results can provide valuable information for developing joint air-quality control strategies.

作者介绍

张嘉琪,中国人民大学统计学院在读博士研究生,主要研究方向为图模型、社区发现、网络重抽样等。

范新妍,中国人民大学统计学院讲师。主要从事多源数据分析与高维数据分析等领域研究,累计发表论文十余篇,涉及Journal of Multivariate Analysis、Computational Statistics and Data Analysis、Statistics in Medicine、Annals of Operations Research、Statistical Methods in Medical Research、Journal of the Royal Statistical Society Series C、统计研究等国内外权威杂志。

李扬,中国人民大学统计学院教授、博士生导师,副院长、统计咨询研究中心主任;国际统计学会推选会员、中国商业统计学会副会长、北京生物医学统计与数据管理研究会监事长;主要从事相关型数据分析、模型选择与不确定性评价、潜变量建模、临床试验设计等领域研究,承担国家自然科学基金面上项目、全国统计科学研究重大项目等科研项目二十余项,发表JASA、JAMA IM、Biometrics、Biostatistics、统计研究等期刊论文七十余篇。

马双鸽,耶鲁大学生物统计系教授,国际统计学会推选会员、美国统计学会会士。研究主要集中于生物统计、遗传流行病学、生存分析、高维数据分析等。担任JASA, AISM, Briefings in Bioinformatics等多个国际期刊副主编。已在Nature Genetics、JASA、The Annals of Statistics、Biometrika、Briefings in Bioinformatics等国际权威期刊发表论文数百篇。

论文发表截图