User:Rvuchkov/sandbox

In algebraic statistics the concept of maximum likelihood degree (ML degree) arises naturally as the number of complex solutions of the likelihood equations. The ML degree is bounded by the degree of the likelihood ideal.

Introduction

A parametric probability model for a discrete random variable is given by a map $\psi :U\rightarrow \Delta _{n-1}$ where $U\subset \mathbb {R} ^{d}$ is an open set and $\Delta _{n-1}$ is the probability simplex $\Delta _{n-1}=\{(p_{1},p_{2},\dots ,p_{n}):p_{1}+p_{2}+\dots +p_{n}=1,$ $p_{i}\geq 0\}$ . The model is $\psi (U)$ , where $\psi =(\psi _{1}(\theta ),\dots ,\psi _{n}(\theta ))$ and $\theta =(\theta _{1},\dots ,\theta _{d})$ . The problem of maximum-likelihood estimation for a fixed data vector $u=(u_{1},\dots ,u_{n})$ is to find a parameter $\theta$ that best explains the data vector, leading us to the problem of maximizing

\psi _{1}(\theta )^{u_{1}}\dots \psi _{n}(\theta )^{u_{n}}

subject to

\psi _{1}+\dots +\psi _{n}=1

Equivalently maximizing the log-likelihood function

\sum _{i=1}^{n}u_{i}\log \psi _{i}

In the above definition if $\psi _{1},\dots ,\psi _{n}$ are polynomials in $(\theta _{1},\dots ,\theta _{d})$ , the Zariski closure of $\psi (U)$ is called an algebraic statistical model. To employ tools from algebraic geometry we let the domain of $\psi$ be extended to the complex numbers.

\psi :\mathbb {C} ^{d}\rightarrow \mathbb {C} ^{n}

such that

\psi _{1}+\dots +\psi _{n}=1

The Maximum Likelihood Degree

The maximum likelihood degree (ML degree) of a discrete statistical model is the number of complex critical points of the log-likelihood function.

\sum _{i=1}^{n}u_{i}\log \psi _{i}

for generic data

u=(u_{1},\dots ,u_{n})

.

Equivalently the ML degree is the number of complex solutions to the system of equations

{\begin{aligned}{\frac {\partial \sum _{i=1}^{n}u_{i}\log \psi _{i}}{\partial \theta _{1}}}&=N{\frac {\partial g}{\partial \theta _{1}}},\\\vdots \\{\frac {\partial \sum _{i=1}^{n}u_{i}\log \psi _{i}}{\partial \theta _{n}}}&=N{\frac {\partial g}{\partial \theta _{n}}}.\end{aligned}}

Where $g=\psi _{1}+\dots +\psi _{n}$ and $N$ is the sample size $\sum _{i=1}^{n}u_{i}$ .

^[1]

Example

The Hardy-Weinberg curve has ML degree 1.

{\begin{aligned}\psi _{0}(\theta )&=\theta ^{2},\\\psi _{1}(\theta )&=2\theta (1-\theta ),\\\psi _{2}(\theta )&=(1-\theta )^{2}.\end{aligned}}

Where $\theta$ is the parameter of a biased coin to land on tails. With these equations, we suppose that the coin is tossed twice; Then the equations represent the probability of a heads appearing. We can repeat the experiment $N$ times. Construct a data vector $u=(u_{0},u_{1},u_{2})$ where $u_{i}$ is the number of times $i$ heads appear. Inspired by the MLE problem of estimating the unknown parameter $\theta$ by maximizing the following:

$\ell _{u_{0},u_{1},u_{2}}=\psi _{0}(\theta )^{u_{0}}\psi _{1}(\theta )^{u_{1}}\psi _{2}(\theta )^{u_{2}}$

One may find it more convenient to work with the logarithm of the likelihood function. Furthermore there are many ways to maximize the log-likelihood function such as Lagrange multipliers and other optimization tools. For this particular example we do not need any such machinery.

$\log l_{u_{0},u_{1},u_{2}}=(2u_{0}+u_{1})\log \theta +(u_{1}+2u_{2})\log \theta$

After applying the Lagrange multipliers method for optimization the following polynomial (likelihood equation) is produced.

$(2u_{0}+2u_{1}+2u_{2})\theta -(2u_{0}+u_{1})=0$

This is a polynomial of degree 1, which implies one complex solution. Thus ML degree is 1.^[2] This example is very interesting because a generic quadric has ML degree of 6. This implies that our example that was given here is of particular interest. In general solving the likelihood equations might be troublesome to do by hand and computer software such as Macaulay2, Singular, and Polymake, may be helpful.

Birch's Theorem for Toric Ideal

Let $A\in \mathbb {N} ^{d\times k}$ and $u\in \mathbb {N} ^{k}$ be a vector of positive counts. The maximum likelihood estimate of the frequencies ${\hat {u}}$ in the log-linear model ${\mathcal {M}}_{A}$ is the unique-nonegative solution to the simultaneous system of equations:

$A{\hat {u}}=Au$ and ${\hat {u}}\in V(I_{A})$

Note that the number of complex solutions to $A{\hat {u}}=Au$ is the ML degree. ^[3]

This diagram shows the Newton polygon for P(x,y) = 3 x^2 y^3 - x y^2 + 2 x^2 y^2 - x^3 y, with positive monomials in red and negative monomials in cyan. Faces are labelled with the limiting terms they correspond to.

History

The ML degree is a more recent work which started with two papers: The maximum likelihood degree paper by [Catanese, Hosten, Khetan, Sturmfels] and Solving likelihood equations paper by [Hosten, Khetan, Sturmfels]. Both papers describe on the connection between ML degree and polytopes and Newton polytopes. ^[4]

References

^ Catanese, F., Hoşten, S., Khetan, A., & Sturmfels, B.. (2006). The Maximum Likelihood Degree. American Journal of Mathematics, 128(3), 671–697. Retrieved from http://www.jstor.org/stable/40067993
^ Huh, June; Sturmfels, Bernd. "Likelihood Geometry". aeXiv.
^ Drton, M., Sturmfels, B., & Sullivant, S. (2008). Lectures on algebraic statistics (Vol. 39). Springer Science & Business Media.
^ Hosten, S., Khetan, A., & Sturmfels, B. (2005). Solving the likelihood equations. Foundations of Computational Mathematics, 5(4), 389-407.

External links

Home page of Twenty-First Holiday Symposium: Gröbner Bases and Convex Polytopes
Home page of D. A. Cox, with several lectures on toric varieties