Given a known input signal
s
[
n
]
{\displaystyle s[n]}
, the output of an unknown LTI system
x
[
n
]
{\displaystyle x[n]}
can be expressed as:
x
[
n
]
=
∑
k
=
0
N
−
1
h
k
s
[
n
−
k
]
+
w
[
n
]
{\displaystyle x[n]=\sum _{k=0}^{N-1}h_{k}s[n-k]+w[n]}
where
h
k
{\displaystyle h_{k}}
is an unknown filter tap coefficients and
w
[
n
]
{\displaystyle w[n]}
is noise.
The model system
x
^
[
n
]
{\displaystyle {\hat {x}}[n]}
, using a Wiener filter solution with an order N, can be expressed as:
x
^
[
n
]
=
∑
k
=
0
N
−
1
h
^
k
s
[
n
−
k
]
{\displaystyle {\hat {x}}[n]=\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k]}
where
h
^
k
{\displaystyle {\hat {h}}_{k}}
are the filter tap coefficients to be determined.
The error between the model and the unknown system can be expressed as:
e
[
n
]
=
x
[
n
]
−
x
^
[
n
]
{\displaystyle e[n]=x[n]-{\hat {x}}[n]}
The total squared error
E
{\displaystyle E}
can be expressed as:
E
=
∑
n
=
−
∞
∞
e
[
n
]
2
{\displaystyle E=\sum _{n=-\infty }^{\infty }e[n]^{2}}
E
=
∑
n
=
−
∞
∞
(
x
[
n
]
−
x
^
[
n
]
)
2
{\displaystyle E=\sum _{n=-\infty }^{\infty }(x[n]-{\hat {x}}[n])^{2}}
E
=
∑
n
=
−
∞
∞
(
x
[
n
]
2
−
2
x
[
n
]
x
^
[
n
]
+
x
^
[
n
]
2
)
{\displaystyle E=\sum _{n=-\infty }^{\infty }(x[n]^{2}-2x[n]{\hat {x}}[n]+{\hat {x}}[n]^{2})}
Use the Minimum mean-square error criterion over all of
n
{\displaystyle n}
by setting its gradient to zero:
∇
E
=
0
{\displaystyle \nabla E=0}
which is
∂
E
∂
h
^
i
=
0
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}=0}
for all
i
=
0
,
1
,
2
,
.
.
.
,
N
−
1
{\displaystyle i=0,1,2,...,N-1}
∂
E
∂
h
^
i
=
∂
∂
h
^
i
∑
n
=
−
∞
∞
[
x
[
n
]
2
−
2
x
[
n
]
x
^
[
n
]
+
x
^
[
n
]
2
]
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}={\frac {\partial }{\partial {\hat {h}}_{i}}}\sum _{n=-\infty }^{\infty }[x[n]^{2}-2x[n]{\hat {x}}[n]+{\hat {x}}[n]^{2}]}
Substitute the definition of
x
^
[
n
]
{\displaystyle {\hat {x}}[n]}
:
∂
E
∂
h
^
i
=
∂
∂
h
^
i
∑
n
=
−
∞
∞
[
x
[
n
]
2
−
2
x
[
n
]
∑
k
=
0
N
−
1
h
^
k
s
[
n
−
k
]
+
(
∑
k
=
0
N
−
1
h
^
k
s
[
n
−
k
]
)
2
]
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}={\frac {\partial }{\partial {\hat {h}}_{i}}}\sum _{n=-\infty }^{\infty }[x[n]^{2}-2x[n]\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k]+(\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k])^{2}]}
Distribute the partial derivative:
∂
E
∂
h
^
i
=
∑
n
=
−
∞
∞
[
−
2
x
[
n
]
s
[
n
−
i
]
+
2
(
∑
k
=
0
N
−
1
h
^
k
s
[
n
−
k
]
)
s
[
n
−
i
]
]
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}=\sum _{n=-\infty }^{\infty }[-2x[n]s[n-i]+2(\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k])s[n-i]]}
Using the definition of discrete cross-correlation :
R
x
y
(
i
)
=
∑
n
=
−
∞
∞
x
[
n
]
y
[
n
−
i
]
{\displaystyle R_{xy}(i)=\sum _{n=-\infty }^{\infty }x[n]y[n-i]}
∂
E
∂
h
^
i
=
−
2
R
x
s
[
i
]
+
2
∑
k
=
0
N
−
1
h
^
k
R
s
s
[
i
−
k
]
=
0
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}=-2R_{xs}[i]+2\sum _{k=0}^{N-1}{\hat {h}}_{k}R_{ss}[i-k]=0}
Rearrange the terms:
R
x
s
[
i
]
=
∑
k
=
0
N
−
1
h
^
k
R
s
s
[
i
−
k
]
{\displaystyle R_{xs}[i]=\sum _{k=0}^{N-1}{\hat {h}}_{k}R_{ss}[i-k]}
for all
i
=
0
,
1
,
2
,
.
.
.
,
N
−
1
{\displaystyle i=0,1,2,...,N-1}
This system of N equations with N unknowns can be determined.
The resulting coefficients of the Wiener filter can be determined by:
W
=
R
x
x
−
1
P
x
s
{\displaystyle W=R_{xx}^{-1}P_{xs}}
, where
P
x
s
{\displaystyle P_{xs}}
is the cross-correlation vector between
x
{\displaystyle x}
and
s
{\displaystyle s}
.
By relaxing the infinite sum of the Wiener filter to just the error at time
n
{\displaystyle n}
, the LMS algorithm can be derived.
The squared error can be expressed as:
E
=
(
d
[
n
]
−
y
[
n
]
)
2
{\displaystyle E=(d[n]-y[n])^{2}}
Using the Minimum mean-square error criterion, take the gradient:
∂
E
∂
w
=
∂
∂
w
(
d
[
n
]
−
y
[
n
]
)
2
{\displaystyle {\frac {\partial E}{\partial w}}={\frac {\partial }{\partial w}}(d[n]-y[n])^{2}}
Apply chain rule and substitute definition of y[n]
∂
E
∂
w
=
2
(
d
[
n
]
−
y
[
n
]
)
∂
∂
w
(
d
[
n
]
−
∑
k
=
0
N
−
1
w
^
k
x
[
n
−
k
]
)
{\displaystyle {\frac {\partial E}{\partial w}}=2(d[n]-y[n]){\frac {\partial }{\partial w}}(d[n]-\sum _{k=0}^{N-1}{\hat {w}}_{k}x[n-k])}
∂
E
∂
w
i
=
−
2
(
e
[
n
]
)
(
x
[
n
−
i
]
)
{\displaystyle {\frac {\partial E}{\partial w_{i}}}=-2(e[n])(x[n-i])}
Using gradient descent and a step size
μ
{\displaystyle \mu }
:
w
[
n
+
1
]
=
w
[
n
]
−
μ
∂
E
∂
w
{\displaystyle w[n+1]=w[n]-\mu {\frac {\partial E}{\partial w}}}
which becomes, for i = 0, 1, ..., N-1,
w
i
[
n
+
1
]
=
w
i
[
n
]
+
2
μ
(
e
[
n
]
)
(
x
[
n
−
i
]
)
{\displaystyle w_{i}[n+1]=w_{i}[n]+2\mu (e[n])(x[n-i])}
This is the LMS update equation.