我中心研究员李扬、孙怡帆在《Statistical Methods in Medical Research》发表论文
2020-07-22
论文题目
Semiparametric integrative interaction analysis for non-small-cell lung cancer
作者介绍
李扬,中国人民大学统计学院教授,统计咨询研究中心主任,国际统计学会推选会员、国际生物统计学会中国分会青年理事、北京生物医学统计与数据管理研究会监事长。主要从事相关型数据分析,模型选择与不确定性评价,潜变量建模,临床试验设计等领域研究。
孙怡帆(通讯作者),中国人民大学统计学院副教授,博士生导师。数理统计系系主任,全国工业统计学教学研究会第九届理事会理事。主要研究方向为复杂数据分析、网络分析、最优化方法等,在Statistics in Medicine,统计研究等学术期刊发表论文20余篇,主持国家和省部级等项目6项。
摘要
In genomic analysis, it is significant though challenging to identify markers associated with cancer outcomes or phenotypes. Based on the biological mechanisms of cancers and the characteristics of datasets, we propose a novel integrative interaction approach under a semiparametric model, in which genetic and environmental factors are included as the parametric and nonparametric components, respectively. The goal of this approach is to identify the genetic factors and gene–gene interactions associated with cancer outcomes, while estimating the nonlinear effects of environmental factors. The proposed approach is based on the threshold gradient-directed regularisation technique. Simulation studies indicate that the proposed approach outperforms alternative methods at identifying the main effects and interactions, and has favourable estimation and prediction accuracy. We analysed non-small-cell lung carcinoma datasets from the Cancer Genome Atlas, and the results demonstrate that the proposed approach can identify markers with important implications and that it performs favourably in terms of prediction accuracy, identification stability, and computation cost.
中文摘要
在基因组分析中,识别与癌症预后或表征相关的标记物是一项非常有意义且具有挑战性的工作。基于癌症的生物学机制和数据集的特点,本文提出了一种新的半参数模型下的整合交互效应方法,其中基因和环境分别作为参数和非参数变量。该方法的目标是确定与癌症预后相关的基因以及基因-基因相互作用,同时估计出环境因素对癌症预后的非线性影响。该方法基于阈值梯度正则化技术。数值实验结果表明,该方法在识别主效应和交互效应方面优于其他方法,具有良好的估计和预测精度。本文分析了癌症基因组图谱的非小细胞肺癌数据集,结果表明该方法能够识别具有生物意义的标记物,并且在预测精度、识别稳定性和计算时间方面具有良好表现。