Sub-Gaussian distribution

In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a subgaussian distribution are dominated by (i.e. decay at least as fast as) the tails of a Gaussian. This property gives subgaussian distributions their name.

Often in analysis, we divide an object (such as a random variable) into two parts, a central bulk and a distant tail, then analyze each separately. In probability, this division usually goes like "Everything interesting happens near the center. The tail event is so rare, we may safely ignore that." Subgaussian distributions are worthy of study, because the gaussian distribution is well-understood, and so we can give sharp bounds on the rarity of the tail event. Similarly, the subexponential distributions are also worthy of study.

Formally, the probability distribution of a random variable $X$ is called subgaussian if there is a positive constant C such that for every $t\geq 0$ ,

{\textstyle \operatorname {P} (|X|\geq t)\leq 2\exp {(-t^{2}/C^{2})}}

.

There are many equivalent definitions. For example, a random variable $X$ is sub-Gaussian iff its distribution function is upper bounded (up to a constant) by the distribution function of a Gaussian:

P(|X|\geq t)\leq cP(|Z|\geq t)\quad \forall t>0

where $c\geq 0$ is constant and $Z$ is a mean zero Gaussian random variable.^[1]^{: Theorem 2.6}

Definitions edit

Subgaussian norm edit

The subgaussian norm of $X$ , denoted as $\Vert X\Vert _{\psi _{2}}$ , is

\Vert X\Vert _{\psi _{2}}=\inf \left\{c>0:\operatorname {E} \left[\exp {\left({\frac {X^{2}}{c^{2}}}\right)}\right]\leq 2\right\}.

In other words, it is the Orlicz norm of

X

generated by the Orlicz function

\Phi (u)=e^{u^{2}}-1.

By condition

(2)

below, subgaussian random variables can be characterized as those random variables with finite subgaussian norm.

Variance proxy edit

If there exists some $s^{2}$ such that $\operatorname {E} [e^{(X-\operatorname {E} [X])t}]\leq e^{\frac {s^{2}t^{2}}{2}}$ for all $t$ , then $s^{2}$ is called a variance proxy, and the smallest such $s^{2}$ is called the optimal variance proxy and denoted by $\Vert X\Vert _{\mathrm {vp} }^{2}$ .

Since $\operatorname {E} [e^{(X-\operatorname {E} [X])t}]=e^{\frac {\sigma ^{2}t^{2}}{2}}$ when $X\sim {\mathcal {N}}(\mu ,\sigma ^{2})$ is Gaussian, we then have $\|X\|_{vp}^{2}=\sigma ^{2}$ , as it should.

Equivalent definitions edit

Let $X$ be a random variable. The following conditions are equivalent: (Proposition 2.5.2 ^[2])

Tail probability bound: $\operatorname {P} (|X|\geq t)\leq 2\exp {(-t^{2}/K_{1}^{2})}$ for all $t\geq 0$ , where $K_{1}$ is a positive constant;
Finite subgaussian norm: $\Vert X\Vert _{\psi _{2}}=K_{2}<\infty$ .
Moment: $\operatorname {E} |X|^{p}\leq 2K_{3}^{p}\Gamma \left({\frac {p}{2}}+1\right)$ for all $p\geq 1$ , where $K_{3}$ is a positive constant and $\Gamma$ is the Gamma function.
Moment: $\operatorname {E} |X|^{p}\leq K^{p}p^{p/2}$ for all $p\geq 1$ ,
Moment-generating function (of $X$ ), or variance proxy^[3]^[4] : $\operatorname {E} [e^{(X-\operatorname {E} [X])t}]\leq e^{\frac {K^{2}t^{2}}{2}}$ for all $t$ , where $K$ is a positive constant.
Moment-generating function (of $X^{2}$ ): for some $K>0$ , $\operatorname {E} [e^{X^{2}t^{2}}]\leq e^{K^{2}t^{2}}$ for all $t\in [-1/K,+1/K]$ .
Union bound: for some c > 0, $\ \operatorname {E} [\max\{|X_{1}-\operatorname {E} [X]|,\ldots ,|X_{n}-\operatorname {E} [X]|\}]\leq c{\sqrt {\log n}}$ for all n > c, where $X_{1},\ldots ,X_{n}$ are i.i.d copies of X.
Subexponential: $X^{2}$ has a subexponential distribution.

Furthermore, the constant $K$ is the same in the definitions (1) to (5), up to an absolute constant. So for example, given a random variable satisfying (1) and (2), the minimal constants $K_{1},K_{2}$ in the two definitions satisfy $K_{1}\leq cK_{2},K_{2}\leq c'K_{1}$ , where $c,c'$ are constants independent of the random variable.

Proof of equivalence edit

As an example, the first four definitions are equivalent by the proof below.

Proof. $(1)\implies (3)$ By the layer cake representation,

{\begin{aligned}\operatorname {E} |X|^{p}&=\int _{0}^{\infty }\operatorname {P} (|X|^{p}\geq t)dt\\&=\int _{0}^{\infty }pt^{p-1}\operatorname {P} (|X|\geq t)dt\\&\leq 2\int _{0}^{\infty }pt^{p-1}\exp \left(-{\frac {t^{2}}{K_{1}^{2}}}\right)dt\\\end{aligned}}

After a change of variables $u=t^{2}/K_{1}^{2}$ , we find that

{\begin{aligned}\operatorname {E} |X|^{p}&\leq 2K_{1}^{p}{\frac {p}{2}}\int _{0}^{\infty }u^{{\frac {p}{2}}-1}e^{-u}du\\&=2K_{1}^{p}{\frac {p}{2}}\Gamma \left({\frac {p}{2}}\right)\\&=2K_{1}^{p}\Gamma \left({\frac {p}{2}}+1\right).\end{aligned}}

(3)\implies (2)

By the Taylor series

{\textstyle e^{x}=1+\sum _{p=1}^{\infty }{\frac {x^{p}}{p!}},}

{\begin{aligned}\operatorname {E} [\exp {(\lambda X^{2})}]&=1+\sum _{p=1}^{\infty }{\frac {\lambda ^{p}\operatorname {E} {[X^{2p}]}}{p!}}\\&\leq 1+\sum _{p=1}^{\infty }{\frac {2\lambda ^{p}K_{3}^{2p}\Gamma \left(p+1\right)}{p!}}\\&=1+2\sum _{p=1}^{\infty }\lambda ^{p}K_{3}^{2p}\\&=2\sum _{p=0}^{\infty }\lambda ^{p}K_{3}^{2p}-1\\&={\frac {2}{1-\lambda K_{3}^{2}}}-1\quad {\text{for }}\lambda K_{3}^{2}<1,\end{aligned}}

which is less than or equal to

2

for

\lambda \leq {\frac {1}{3K_{3}^{2}}}

. Let

K_{2}\geq 3^{\frac {1}{2}}K_{3}

, then

{\textstyle \operatorname {E} [\exp {(X^{2}/K_{2}^{2})}]\leq 2.}

$(2)\implies (1)$ By Markov's inequality,

\operatorname {P} (|X|\geq t)=\operatorname {P} \left(\exp \left({\frac {X^{2}}{K_{2}^{2}}}\right)\geq \exp \left({\frac {t^{2}}{K_{2}^{2}}}\right)\right)\leq {\frac {\operatorname {E} [\exp {(X^{2}/K_{2}^{2})}]}{\exp \left({\frac {t^{2}}{K_{2}^{2}}}\right)}}\leq 2\exp \left(-{\frac {t^{2}}{K_{2}^{2}}}\right).

(3)\iff (4)

by asymptotic formula for gamma function:

\Gamma (p/2+1)\sim {\sqrt {\pi p}}\left({\frac {p}{2e}}\right)^{p/2}

.

From the proof, we can extract a cycle of three inequalities:

If $\operatorname {P} (|X|\geq t)\leq 2\exp {(-t^{2}/K^{2})}$ , then $\operatorname {E} |X|^{p}\leq 2K^{p}\Gamma \left({\frac {p}{2}}+1\right)$ for all $p\geq 1$ .
If $\operatorname {E} |X|^{p}\leq 2K^{p}\Gamma \left({\frac {p}{2}}+1\right)$ for all $p\geq 1$ , then $\|X\|_{\psi _{2}}\leq 3^{\frac {1}{2}}K$ .
If $\|X\|_{\psi _{2}}\leq K$ , then $\operatorname {P} (|X|\geq t)\leq 2\exp {(-t^{2}/K^{2})}$ .

In particular, the constant $K$ provided by the definitions are the same up to a constant factor, so we can say that the definitions are equivalent up to a constant independent of $X$ .

Similarly, because up to a positive multiplicative constant, $\Gamma (p/2+1)=p^{p/2}\times ((2e)^{-1/2}p^{1/2p})^{p}$ for all $p\geq 1$ , the definitions (3) and (4) are also equivalent up to a constant.

Basic properties edit

Proposition.

If $X$ is subgaussian, and $k>0$ , then $\|kX\|_{\psi _{2}}=k\|X\|_{\psi _{2}}$ and $\|kX\|_{vp}=k\|X\|_{vp}$ .
If $X,Y$ are subgaussian, then $\|X+Y\|_{vp}^{2}\leq (\|X\|_{vp}+\|Y\|_{vp})^{2}$ .

Proposition. (Chernoff bound) If $X$ is subgaussian, then $Pr(X\geq t)\leq e^{-{\frac {t^{2}}{2\|X\|_{vp}^{2}}}}$ for all $t\geq 0$ .

Definition. $X\lesssim X'$ means that $X\leq CX'$ , where the positive constant $C$ is independent of $X$ and $X'$ .

Proposition. If $X$ is subgaussian, then $\|X-E[X]\|_{\psi _{2}}\lesssim \|X\|_{\psi _{2}}$ .

Proof. By triangle inequality, $\|X-E[X]\|_{\psi _{2}}\leq \|X\|_{\psi _{2}}+\|E[X]\|_{\psi _{2}}$ . Now we have $\|E[X]\|_{\psi _{2}}={\sqrt {\ln 2}}|E[X]|\leq {\sqrt {\ln 2}}E[|X|]\sim E[|X|]$ . By the equivalence of definitions (2) and (4) of subgaussianity, given above, we have $E[|X|]\lesssim \|X\|_{\psi _{2}}$ .

Proposition. If $X,Y$ are subgaussian and independent, then $\|X+Y\|_{vp}^{2}\leq \|X\|_{vp}^{2}+\|Y\|_{vp}^{2}$ .

Proof. If independent, then use that the cumulant of independent random variables is additive. That is, $\ln \operatorname {E} [e^{t(X+Y)}]=\ln \operatorname {E} [e^{tX}]+\ln \operatorname {E} [e^{tY}]$ .

If not independent, then by Hölder's inequality, for any $1/p+1/q=1$ we have

E[e^{t(X+Y)}]=\|e^{t(X+Y)}\|_{1}\leq e^{{\frac {1}{2}}t^{2}(p\|X\|_{vp}^{2}+q\|Y\|_{vp}^{2})}

Solving the optimization problem

{\begin{cases}\min p\|X\|_{vp}^{2}+q\|Y\|_{vp}^{2}\\1/p+1/q=1\end{cases}}

, we obtain the result.

Corollary. Linear sums of subgaussian random variables are subgaussian.

Strictly subgaussian edit

Expanding the cumulant generating function:

{\frac {1}{2}}s^{2}t^{2}\geq \ln \operatorname {E} [e^{tX}]={\frac {1}{2}}\mathrm {Var} [X]t^{2}+\kappa _{3}t^{3}+\cdots

we find that

\mathrm {Var} [X]\leq \|X\|_{\mathrm {vp} }^{2}

. At the edge of possibility, we define that a random variable

X

satisfying

\mathrm {Var} [X]=\|X\|_{\mathrm {vp} }^{2}

is called strictly subgaussian.

Properties edit

Theorem.^[5] Let $X$ be a subgaussian random variable with mean zero. If all zeros of its characteristic function are real, then $X$ is strictly subgaussian.

Corollary. If $X_{1},\dots ,X_{n}$ are independent and strictly subgaussian, then any linear sum of them is strictly subgaussian.

Examples edit

By calculating the characteristic functions, we can show that some distributions are strictly subgaussian: symmetric uniform distribution, symmetric Bernoulli distribution.

Since a symmetric uniform distribution is strictly subgaussian, its convolution with itself is strictly subgaussian. That is, the symmetric triangular distribution is strictly subgaussian.

Since the symmetric Bernoulli distribution is strictly subgaussian, any symmetric Binomial distribution is strictly subgaussian.

Examples edit


	$\\|X\\|_{\psi _{2}}$	$\\|X\\|_{vp}^{2}$	strictly subgaussian?
gaussian distribution ${\mathcal {N}}(0,1)$	${\sqrt {8/3}}$	$1$	Yes
mean-zero Bernoulli distribution $p\delta _{q}+q\delta _{-p}$	solution to $pe^{(q/t)^{2}}+qe^{(p/t)^{2}}=2$	${\frac {p-q}{2(\log p-\log q)}}$	Iff $p=0,1/2,1$
symmetric Bernoulli distribution ${\frac {1}{2}}\delta _{1/2}+{\frac {1}{2}}\delta _{-1/2}$	${\frac {1}{2{\sqrt {\ln 2}}}}$	$1/4$	Yes
uniform distribution $U(-1,1)$	solution to $\int _{0}^{1}e^{x^{2}/t^{2}}dx=2$ , approximately 0.7727	$1/3$	Yes
arbitrary distribution on interval $[a,b]$		$\leq \left({\frac {b-a}{2}}\right)^{2}$

The optimal variance proxy $\Vert X\Vert _{\mathrm {vp} }^{2}$ is known for many standard probability distributions, including the beta, Bernoulli, Dirichlet^[6], Kumaraswamy, triangular^[7], truncated Gaussian, and truncated exponential^[8].

Bernoulli distribution edit

Let $p+q=1$ be two positive numbers. Let $X$ be a centered Bernoulli distribution $p\delta _{q}+q\delta _{-p}$ , so that it has mean zero, then $\Vert X\Vert _{\mathrm {vp} }^{2}={\frac {p-q}{2(\log p-\log q)}}$ .^[9] Its subgaussian norm is $t$ where $t$ is the unique positive solution to $pe^{(q/t)^{2}}+qe^{(p/t)^{2}}=2$ .

Let $X$ be a random variable with symmetric Bernoulli distribution (or Rademacher distribution). That is, $X$ takes values $-1$ and $1$ with probabilities $1/2$ each. Since $X^{2}=1$ , it follows that

\Vert X\Vert _{\psi _{2}}=\inf \left\{c>0:\operatorname {E} \left[\exp {\left({\frac {X^{2}}{c^{2}}}\right)}\right]\leq 2\right\}=\inf \left\{c>0:\operatorname {E} \left[\exp {\left({\frac {1}{c^{2}}}\right)}\right]\leq 2\right\}={\frac {1}{\sqrt {\ln 2}}},

and hence $X$ is a subgaussian random variable.

Bounded distributions edit

Some commonly used bounded distributions.

Bounded distributions have no tail at all, so clearly they are subgaussian.

If $X$ is bounded within the interval $[a,b]$ , then since $\mathrm {Var} [X]\leq \left({\frac {b-a}{2}}\right)^{2}$ , we have $\Vert X\Vert _{\mathrm {vp} }^{2}\leq \left({\frac {b-a}{2}}\right)^{2}$ . Now, applying a Chernoff bound, we have Hoeffding's inequality.

Convolutions edit

Density of a mixture of three normal distributions (μ = 5, 10, 15, σ = 2) with equal weights. Each component is shown as a weighted density (each integrating to 1/3)

Since the sum of subgaussian random variables is still subgaussian, the convolution of subgaussian distributions is still subgaussian. In particular, any convolution of the normal distribution with any bounded distribution is subgaussian.

Mixtures edit

Given subgaussian distributions $X_{1},X_{2},\dots ,X_{n}$ , we can construct an additive mixture $X$ as follows: first randomly pick a number $i\in \{1,2,\dots ,n\}$ , then pick $X_{i}$ .

Since $\operatorname {E} \left[\exp {\left({\frac {X^{2}}{c^{2}}}\right)}\right]=\sum _{i}p_{i}\operatorname {E} \left[\exp {\left({\frac {X_{i}^{2}}{c^{2}}}\right)}\right]$ we have $\|X\|_{\psi _{2}}\leq \max _{i}\|X_{i}\|_{\psi _{2}}$ , and so the mixture is subgaussian.

In particular, any gaussian mixture is subgaussian.

More generally, the mixture of infinitely many subgaussian distributions is also subgaussian, if the subgaussian norm has a finite supremum: $\|X\|_{\psi _{2}}\leq \sup _{i}\|X_{i}\|_{\psi _{2}}$ .

Subgaussian random vectors edit

So far, we have discussed subgaussianity for real-valued random variables. We can also define subgaussianity for random vectors. The purpose of subgaussianity is to make the tails decay fast, so we generalize accordingly: a subgaussian random vector is a random vector where the tail decays fast.

Let $X$ be a random vector taking values in $\mathbb {R} ^{n}$ .

Define.

$\|X\|_{\psi _{2}}:=\sup _{v\in S^{n-1}}\|v^{T}X\|_{\psi _{2}}$ , where $S^{n-1}$ is the unit sphere in $\mathbb {R} ^{n}$ .
$X$ is subgaussian iff $\|X\|_{\psi _{2}}<\infty$ .

Theorem. (Theorem 3.4.6 ^[2]) For any positive integer $n$ , the uniformly distributed random vector $X\sim U({\sqrt {n}}S^{n-1})$ is subgaussian, with $\|X\|_{\psi _{2}}\lesssim {}1$ .

This is not so surprising, because as $n\to \infty$ , the projection of $U({\sqrt {n}}S^{n-1})$ to the first coordinate converges in distribution to the standard normal distribution.

Maximum inequalities edit

Proposition. If $X_{1},\dots ,X_{n}$ are mean-zero subgaussians, with $\|X_{i}\|_{vp}^{2}\leq \sigma ^{2}$ , then for any $\delta >0$ , we have $\max(X_{1},\dots ,X_{n})\leq \sigma {\sqrt {2\ln {\frac {n}{\delta }}}}$ with probability $\geq 1-\delta$ .

Proof. By the Chernoff bound, $Pr(X_{i}\geq \sigma {\sqrt {2\ln(n/\delta )}})\leq \delta /n$ . Now apply the union bound.

Proposition. (Exercise 2.5.10 ^[2]) If $X_{1},X_{2},\dots$ are subgaussians, with $\|X_{i}\|_{\psi _{2}}\leq K$ , then

E\left[\sup _{n}{\frac {|X_{n}|}{\sqrt {1+\ln n}}}\right]\lesssim K,\quad E\left[\max _{1\leq n\leq N}|X_{n}|\right]\lesssim K{\sqrt {\ln N}}

Further, the bound is sharp, since when

X_{1},X_{2},\dots

are IID samples of

{\mathcal {N}}(0,1)

we have

E\left[\max _{1\leq n\leq N}|X_{n}|\right]\gtrsim {\sqrt {\ln N}}

.^[10]

^[11]

Theorem. (over a finite set) If $X_{1},\dots ,X_{n}$ are subgaussian, with $\|X_{i}\|_{vp}^{2}\leq \sigma ^{2}$ , then

{\begin{aligned}E[\max _{i}(X_{i}-E[X_{i}])]\leq \sigma {\sqrt {2\ln n}},&\quad P(\max _{i}X_{i}>t)\leq ne^{-{\frac {t^{2}}{2\sigma ^{2}}}},\\E[\max _{i}|X_{i}-E[X_{i}]|]\leq \sigma {\sqrt {2\ln(2n)}},&\quad P(\max _{i}|X_{i}|>t)\leq 2ne^{-{\frac {t^{2}}{2\sigma ^{2}}}}\end{aligned}}

Theorem. (over a convex polytope) Fix a finite set of vectors

v_{1},\dots ,v_{n}

. If

X

is a random vector, such that each

\|v_{i}^{T}X\|_{vp}^{2}\leq \sigma ^{2}

, then the above 4 inequalities hold, with

\max _{v\in \mathrm {conv} (v_{1},\dots ,v_{n})}v^{T}X

replacing

\max _{i}X_{i}

.

Here, $\mathrm {conv} (v_{1},\dots ,v_{n})$ is the convex polytope spanned by the vectors $v_{1},\dots ,v_{n}$ .

Theorem. (over a ball) If $X$ is a random vector in $\mathbb {R} ^{d}$ , such that $\|v^{T}X\|_{vp}^{2}\leq \sigma ^{2}$ for all $v$ on the unit sphere $S$ , then

E[\max _{v\in S}v^{T}X]=E[\max _{v\in S}|v^{T}X|]\leq 4\sigma {\sqrt {d}}

For any

\delta >0

, with probability at least

1-\delta

,

\max _{v\in S}v^{T}X=\max _{v\in S}|v^{T}X|\leq 4\sigma {\sqrt {d}}+2\sigma {\sqrt {2\log(1/\delta )}}

Inequalities edit

Theorem. (Theorem 2.6.1 ^[2]) There exists a positive constant $C$ such that given any number of independent mean-zero subgaussian random variables $X_{1},\dots ,X_{n}$ ,

\left\|\sum _{i=1}^{n}X_{i}\right\|_{\psi _{2}}^{2}\leq C\sum _{i=1}^{n}\left\|X_{i}\right\|_{\psi _{2}}^{2}

Theorem. (Hoeffding's inequality) (Theorem 2.6.3 ^[2]) There exists a positive constant

c

such that given any number of independent mean-zero subgaussian random variables

X_{1},\dots ,X_{N}

,

\mathbb {P} \left(\left|\sum _{i=1}^{N}X_{i}\right|\geq t\right)\leq 2\exp \left(-{\frac {ct^{2}}{\sum _{i=1}^{N}\left\|X_{i}\right\|_{\psi _{2}}^{2}}}\right)\quad \forall t>0

Theorem. (Bernstein's inequality) (Theorem 2.8.1 ^[2]) There exists a positive constant

c

such that given any number of independent mean-zero subexponential random variables

X_{1},\dots ,X_{N}

,

\mathbb {P} \left(\left|\sum _{i=1}^{N}X_{i}\right|\geq t\right)\leq 2\exp \left(-c\min \left({\frac {t^{2}}{\sum _{i=1}^{N}\left\|X_{i}\right\|_{\psi _{1}}^{2}}},{\frac {t}{\max _{i}\left\|X_{i}\right\|_{\psi _{1}}}}\right)\right)

Theorem. (Khinchine inequality) (Exercise 2.6.5 ^[2]) There exists a positive constant

C

such that given any number of independent mean-zero variance-one subgaussian random variables

X_{1},\dots ,X_{N}

, any

p\geq 2

, and any

a_{1},\dots ,a_{N}\in \mathbb {R}

,

\left(\sum _{i=1}^{N}a_{i}^{2}\right)^{1/2}\leq \left\|\sum _{i=1}^{N}a_{i}X_{i}\right\|_{L^{p}}\leq CK{\sqrt {p}}\left(\sum _{i=1}^{N}a_{i}^{2}\right)^{1/2}

Hanson-Wright inequality edit

The Hanson-Wright inequality states that if a random vector $X$ is subgaussian in a certain sense, then any quadratic form $A$ of this vector, $X^{T}AX$ , is also subgaussian/subexponential. Further, the upper bound on the tail of $X^{T}AX$ , is uniform.

A weak version of the following theorem was proved in (Hanson, Wright, 1971).^[12] There are many extensions and variants. Much like the central limit theorem, the Hanson-Wright inequality is more a cluster of theorems with the same purpose, than a single theorem. The purpose is to take a subgaussian vector and uniformly bound its quadratic forms.

Theorem.^[13]^[14] There exists a constant $c$ , such that:

Let $n$ be a positive integer. Let $X_{1},...,X_{n}$ be independent random variables, such that each satisfies $E[X_{i}]=0$ . Combine them into a random vector $X=(X_{1},\dots ,X_{n})$ . For any $n\times n$ matrix $A$ , we have

P(|X^{T}AX-E[X^{T}AX]|>t)\leq \max \left(2e^{-{\frac {ct^{2}}{K^{4}\|A\|_{F}^{2}}}},2e^{-{\frac {ct}{K^{2}\|A\|}}}\right)=2\exp \left[-c\min \left({\frac {t^{2}}{K^{4}\|A\|_{F}^{2}}},{\frac {t}{K^{2}\|A\|}}\right)\right]

where

K=\max _{i}\|X_{i}\|_{\psi _{2}}

, and

\|A\|_{F}={\sqrt {\sum _{ij}A_{ij}^{2}}}

is the Frobenius norm of the matrix, and

\|A\|=\max _{\|x\|_{2}=1}\|Ax\|_{2}

is the operator norm of the matrix.

In words, the quadratic form $X^{T}AX$ has its tail uniformly bounded by an exponential, or a gaussian, whichever is larger.

In the statement of the theorem, the constant $c$ is an "absolute constant", meaning that it has no dependence on $n,X_{1},\dots ,X_{n},A$ . It is a mathematical constant much like pi and e.

Consequences edit

Theorem (subgaussian concentration).^[13] There exists a constant $c$ , such that:

Let $n,m$ be positive integers. Let $X_{1},...,X_{n}$ be independent random variables, such that each satisfies $E[X_{i}]=0,E[X_{i}^{2}]=1$ . Combine them into a random vector $X=(X_{1},\dots ,X_{n})$ . For any $m\times n$ matrix $A$ , we have

P(|\|AX\|_{2}-\|A\|_{F}|>t)\leq 2e^{-{\frac {ct^{2}}{K^{4}\|A\|^{2}}}}

In words, the random vector

AX

is concentrated on a spherical shell of radius

\|A\|_{F}

, such that

\|AX\|_{2}-\|A\|_{F}

is subgaussian, with subgaussian norm

\leq {\sqrt {3/c}}\|A\|K^{2}

.

Notes edit

^ Wainwright MJ. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge: Cambridge University Press; 2019. doi:10.1017/9781108627771, ISBN 9781108627771.
^ ^a ^b ^c ^d ^e ^f ^g Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science. Cambridge: Cambridge University Press.
^ Kahane, J.P. (1960). "Propriétés locales des fonctions à séries de Fourier aléatoires". Studia Mathematica. 19: 1–25. doi:10.4064/sm-19-1-1-25.
^ Buldygin, V.V.; Kozachenko, Yu.V. (1980). "Sub-Gaussian random variables". Ukrainian Mathematical Journal. 32 (6): 483–489. doi:10.1007/BF01087176.
^ Bobkov, S. G.; Chistyakov, G. P.; Götze, F. (2023-08-03), Strictly subgaussian probability distributions, arXiv:2308.01749
^ Olivier Marchal and Julyan Arbel. On the sub-Gaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1--14, 2017, doi:10.1214/17-ECP92.
^ Julyan Arbel, Olivier Marchal, and Hien D Nguyen. On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables. ESAIM: Probability & Statistics, 24:39--55, 2020, doi:10.1051/ps/2019018.
^ Mathias Barreto, Olivier Marchal, and Julyan Arbel. Optimal sub-Gaussian variance proxy for truncated Gaussian and exponential random variables, 2024, doi:10.48550/arXiv.2403.08628.
^ Bobkov, S. G.; Chistyakov, G. P.; Götze, F. (2023-08-03), Strictly subgaussian probability distributions, arXiv:2308.01749
^ Kamath, Gautam. "Bounds on the expectation of the maximum of samples from a gaussian." (2015)
^ "MIT 18.S997 | Spring 2015 | High-Dimensional Statistics, Chapter 1. Sub-Gaussian Random Variables" (PDF). MIT OpenCourseWare. Retrieved 2024-04-03.
^ Hanson, D. L.; Wright, F. T. (1971). "A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables". The Annals of Mathematical Statistics. 42 (3): 1079–1083. doi:10.1214/aoms/1177693335. ISSN 0003-4851. JSTOR 2240253.
^ ^a ^b Rudelson, Mark; Vershynin, Roman (January 2013). "Hanson-Wright inequality and sub-gaussian concentration". Electronic Communications in Probability. 18 (none): 1–9. arXiv:1306.2872. doi:10.1214/ECP.v18-2865. ISSN 1083-589X.
^ Vershynin, Roman (2018). "6. Quadratic Forms, Symmetrization, and Contraction". High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. pp. 127–146. doi:10.1017/9781108231596.009. ISBN 978-1-108-41519-4.

References edit

Kahane, J.P. (1960). "Propriétés locales des fonctions à séries de Fourier aléatoires". Studia Mathematica. 19: 1–25. doi:10.4064/sm-19-1-1-25.
Buldygin, V.V.; Kozachenko, Yu.V. (1980). "Sub-Gaussian random variables". Ukrainian Mathematical Journal. 32 (6): 483–489. doi:10.1007/BF01087176.
Ledoux, Michel; Talagrand, Michel (1991). Probability in Banach Spaces. Springer-Verlag.
Stromberg, K.R. (1994). Probability for Analysts. Chapman & Hall/CRC.
Litvak, A.E.; Pajor, A.; Rudelson, M.; Tomczak-Jaegermann, N. (2005). "Smallest singular value of random matrices and geometry of random polytopes" (PDF). Advances in Mathematics. 195 (2): 491–523. doi:10.1016/j.aim.2004.08.004.
Rudelson, Mark; Vershynin, Roman (2010). "Non-asymptotic theory of random matrices: extreme singular values". Proceedings of the International Congress of Mathematicians 2010. pp. 1576–1602. arXiv:1003.2990. doi:10.1142/9789814324359_0111.
Rivasplata, O. (2012). "Subgaussian random variables: An expository note" (PDF). Unpublished.
Vershynin, R. (2018). "High-dimensional probability: An introduction with applications in data science" (PDF). Volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge.
Zajkowskim, K. (2020). "On norms in some class of exponential type Orlicz spaces of random variables". Positivity. An International Mathematics Journal Devoted to Theory and Applications of Positivity. 24(5): 1231--1240. arXiv:1709.02970. doi.org/10.1007/s11117-019-00729-6.

[Wainwright2019-1] Wainwright MJ. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge: Cambridge University Press; 2019. doi:10.1017/9781108627771, ISBN 9781108627771.

[:0-2] ^ ^a ^b ^c ^d ^e ^f ^g Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science. Cambridge: Cambridge University Press.

[kahane-3] Kahane, J.P. (1960). "Propriétés locales des fonctions à séries de Fourier aléatoires". Studia Mathematica. 19: 1–25. doi:10.4064/sm-19-1-1-25.

[buldygin-4] Buldygin, V.V.; Kozachenko, Yu.V. (1980). "Sub-Gaussian random variables". Ukrainian Mathematical Journal. 32 (6): 483–489. doi:10.1007/BF01087176.

[5] Bobkov, S. G.; Chistyakov, G. P.; Götze, F. (2023-08-03), Strictly subgaussian probability distributions, arXiv:2308.01749

[marchal2017-6] Olivier Marchal and Julyan Arbel. On the sub-Gaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1--14, 2017, doi:10.1214/17-ECP92.

[arbel2020-7] Julyan Arbel, Olivier Marchal, and Hien D Nguyen. On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables. ESAIM: Probability & Statistics, 24:39--55, 2020, doi:10.1051/ps/2019018.

[barreto2024-8] Mathias Barreto, Olivier Marchal, and Julyan Arbel. Optimal sub-Gaussian variance proxy for truncated Gaussian and exponential random variables, 2024, doi:10.48550/arXiv.2403.08628.

[9] Bobkov, S. G.; Chistyakov, G. P.; Götze, F. (2023-08-03), Strictly subgaussian probability distributions, arXiv:2308.01749

[10] Kamath, Gautam. "Bounds on the expectation of the maximum of samples from a gaussian." (2015)

[11] "MIT 18.S997 | Spring 2015 | High-Dimensional Statistics, Chapter 1. Sub-Gaussian Random Variables" (PDF). MIT OpenCourseWare. Retrieved 2024-04-03.

[12] Hanson, D. L.; Wright, F. T. (1971). "A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables". The Annals of Mathematical Statistics. 42 (3): 1079–1083. doi:10.1214/aoms/1177693335. ISSN 0003-4851. JSTOR 2240253.

[:1-13] Rudelson, Mark; Vershynin, Roman (January 2013). "Hanson-Wright inequality and sub-gaussian concentration". Electronic Communications in Probability. 18 (none): 1–9. arXiv:1306.2872. doi:10.1214/ECP.v18-2865. ISSN 1083-589X.

[14] Vershynin, Roman (2018). "6. Quadratic Forms, Symmetrization, and Contraction". High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. pp. 127–146. doi:10.1017/9781108231596.009. ISBN 978-1-108-41519-4.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Sub-Gaussian distribution

Contents

Definitions edit

Subgaussian norm edit

Variance proxy edit

Equivalent definitions edit

Proof of equivalence edit

Basic properties edit

Strictly subgaussian edit

Properties edit

Examples edit

Examples edit

Bernoulli distribution edit

Bounded distributions edit

Convolutions edit

Mixtures edit

Subgaussian random vectors edit

Maximum inequalities edit

Inequalities edit

Hanson-Wright inequality edit

Consequences edit

See also edit

Notes edit

References edit