De-sparsified lasso

De-sparsified lasso contributes to construct confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in high-dimensional model.^[1]

High-dimensional linear model

$Y=X\beta ^{0}+\epsilon$ with $n\times p$ design matrix $X=:[X_{1},...,X_{p}]$ ( $n\times p$ vectors $X_{j}$ ), $\epsilon \sim N_{n}(0,\sigma _{\epsilon }^{2}I)$ independent of $X$ and unknown regression $p\times 1$ vector $\beta ^{0}$ .

The usual method to find the parameter is by Lasso: ${\hat {\beta }}^{n}(\lambda )={\underset {\beta \in \mathbb {R} ^{p}}{argmin}}\ {\frac {1}{2n}}\left\|Y-X\beta \right\|_{2}^{2}+\lambda \left\|\beta \right\|_{1}$

The de-sparsified lasso is a method modified from the Lasso estimator which fulfills the Karush–Kuhn–Tucker conditions^[2] is as follows:

${\hat {\beta }}^{n}(\lambda ,M)={\hat {\beta }}^{n}(\lambda )+{\frac {1}{n}}MX^{T}(Y-X{\hat {\beta }}^{n}(\lambda ))$

where $M\in R^{p\times p}$ is an arbitrary matrix. The matrix $M$ is generated using a surrogate inverse covariance matrix.

Generalized linear model

Desparsifying $l_{1}$ -norm penalized estimators and corresponding theory can also be applied to models with convex loss functions such as generalized linear models.

Consider the following $1\times p$ vectors of covariables $x_{i}\in \chi \subset R^{p}$ and univariate responses $y_{i}\in Y\subset R$ for $i=1,...,n$

we have a loss function $\rho _{\beta }(y,x)=\rho (y,x\beta )(\beta \in R^{p})$ which is assumed to be strictly convex function in $\beta \in R^{p}$

The $l_{1}$ -norm regularized estimator is ${\hat {\beta }}={\underset {\beta }{argmin}}(P_{n}\rho _{\beta }+\lambda \left\|\beta \right\|_{1})$

Similarly, the Lasso for node wise regression with matrix input is defined as follows: Denote by ${\hat {\Sigma }}$ a matrix which we want to approximately invert using nodewise lasso.

The de-sparsified $l_{1}$ -norm regularized estimator is as follows: ${\hat {\gamma _{j}}}:={\underset {\gamma \in R^{p-1}}{argmin}}({\hat {\Sigma }}_{j,j}-2{\hat {\Sigma }}_{j,/j}\gamma +\gamma ^{T}{\hat {\Sigma }}_{/j,/j}\gamma +2\lambda _{j}\left\|\gamma \right\|_{1}$

where ${\hat {\Sigma }}_{j,/j}$ denotes the $j$ th row of ${\hat {\Sigma }}$ without the diagonal element $(j,j)$ , and ${\hat {\Sigma }}_{/j,/j}$ is the sub matrix without the $j$ th row and $j$ th column.

References

^ Geer, Sara van de; Buhlmann, Peter; Ritov, Ya'acov; Dezeure, Ruben (2014). "On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models". The Annals of Statistics. 42 (3): 1162–1202. arXiv:1303.0518. doi:10.1214/14-AOS1221. S2CID 9663766.
^ Tibshirani, Ryan; Gordon, Geoff. "Karush-Kuhn-Tucker conditions" (PDF).

[1] Geer, Sara van de; Buhlmann, Peter; Ritov, Ya'acov; Dezeure, Ruben (2014). "On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models". The Annals of Statistics. 42 (3): 1162–1202. arXiv:1303.0518. doi:10.1214/14-AOS1221. S2CID 9663766.

[2] Tibshirani, Ryan; Gordon, Geoff. "Karush-Kuhn-Tucker conditions" (PDF).

[1]

[2]