快速检索:      
引用本文:
【打印本页】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
过刊浏览    高级检索
本文已被:浏览 6次   下载 0  
投稿日期:2025-01-06 录用日期:2025-02-24 最后修改日期:2025-02-14
分享到: 微信 更多
高维数据存在异常值和协变量有测量误差时的多重稳健估计
孙宗仁
四川文理学院
摘要:
随着科学技术的不断发展和计算机的日益应用, 需要对海量的数据建立一个可靠的模型. 在高维数据的背景下, 针对因果推断的变量选择问题引起了越来越多的关注. 本文基于原有的经验似然函数, 在经验似然的约束条件中的估计方程上加了Huber函数和权重函数, 把经验似然方法和稳健估计方程相结合, 再在目标函数中加上惩罚函数, 通过对估计方程中的协变量矩阵做中心化纠正异常值所带来的偏差. 为了不需要对协变量和测量误差的分布作假设, 并为了计算的方便, 提出了可以同时处理异常值和测量误差的影响的方法, 从而达到稳健估计和变量选择同时进行的目的, 也为真实研究中高维协变量存在时的变量选择和因果效应估计提供了新的思路. 实证分析表明所提出的方法对测量误差不敏感, 并且有很高的估计效率.
关键词:  变量选择  约束条件 异常值  测量误差
DOI:
分类号:
基金项目:数学与金融研究中心
Multiple Robust Estimation in High-Dimensional Data with Outliers and Measurement Errors in Covariates
Sun Zongren
Sichuan University of Arts and Sciences,Sichuan,635000
Abstract:
With the continuous development of science and technology and the increasing application of computers, it is necessary to establish a reliable model for massive data. In the context of high-dimensional data, the variable selection problem for causal inference has attracted more and more attention. Based on the original empirical likelihood function, this paper adds the Huber function and the weight function to the estimating equation in the constraint conditions of empirical likelihood, combines the empirical likelihood method with the robust estimating equation, and adds a penalty function to the objective function. By centering the covariate matrix in the estimating equation, the bias caused by outliers is corrected. To avoid making assumptions about the distribution of covariates and measurement errors and for computational convenience, a method that can simultaneously handle the influence of outliers and measurement errors is proposed, achieving the goal of robust estimation and variable selection simultaneously. This also provides a new idea for variable selection and causal effect estimation in real research when high-dimensional covariates exist. Empirical analysis shows that the proposed method is insensitive to measurement errors and has high estimation efficiency.
Key words:  Variable  selection, Constraint  conditions, Outliers, Measurement  errors