This section
needs expansion . You can help by
adding to it .
(April 2010 )
The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function
f
(
x
)
=
x
T
A
x
−
2
b
T
x
.
{\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}}
The conjugate direction method
edit
In the conjugate direction method for minimizing
f
(
x
)
=
x
T
A
x
−
2
b
T
x
.
{\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}}
one starts with an initial guess
x
0
{\displaystyle {\boldsymbol {x}}_{0}}
and the corresponding residual
r
0
=
b
−
A
x
0
{\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}}
, and computes the iterate and residual by the formulae
α
i
=
p
i
T
r
i
p
i
T
A
p
i
,
x
i
+
1
=
x
i
+
α
i
p
i
,
r
i
+
1
=
r
i
−
α
i
A
p
i
{\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{,}}\\{\boldsymbol {x}}_{i+1}&={\boldsymbol {x}}_{i}+\alpha _{i}{\boldsymbol {p}}_{i}{\text{,}}\\{\boldsymbol {r}}_{i+1}&={\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i}\end{aligned}}}
where
p
0
,
p
1
,
p
2
,
…
{\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots }
are a series of mutually conjugate directions, i.e.,
p
i
T
A
p
j
=
0
{\displaystyle {\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}=0}
for any
i
≠
j
{\displaystyle i\neq j}
.
The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions
p
0
,
p
1
,
p
2
,
…
{\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots }
. Specific choices lead to various methods including the conjugate gradient method and Gaussian elimination .
The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.
The general Arnoldi method
edit
In the Arnoldi iteration, one starts with a vector
r
0
{\displaystyle {\boldsymbol {r}}_{0}}
and gradually builds an orthonormal basis
{
v
1
,
v
2
,
v
3
,
…
}
{\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},{\boldsymbol {v}}_{3},\ldots \}}
of the Krylov subspace
K
(
A
,
r
0
)
=
s
p
a
n
{
r
0
,
A
r
0
,
A
2
r
0
,
…
}
{\displaystyle {\mathcal {K}}({\boldsymbol {A}},{\boldsymbol {r}}_{0})=\mathrm {span} \{{\boldsymbol {r}}_{0},{\boldsymbol {Ar}}_{0},{\boldsymbol {A}}^{2}{\boldsymbol {r}}_{0},\ldots \}}
by defining
v
i
=
w
i
/
‖
w
i
‖
2
{\displaystyle {\boldsymbol {v}}_{i}={\boldsymbol {w}}_{i}/\lVert {\boldsymbol {w}}_{i}\rVert _{2}}
where
v
i
=
{
r
0
if
i
=
1
,
A
v
i
−
1
−
∑
j
=
1
i
−
1
(
v
j
T
A
v
i
−
1
)
v
j
if
i
>
1
.
{\displaystyle {\boldsymbol {v}}_{i}={\begin{cases}{\boldsymbol {r}}_{0}&{\text{if }}i=1{\text{,}}\\{\boldsymbol {Av}}_{i-1}-\sum _{j=1}^{i-1}({\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i-1}){\boldsymbol {v}}_{j}&{\text{if }}i>1{\text{.}}\end{cases}}}
In other words, for
i
>
1
{\displaystyle i>1}
,
v
i
{\displaystyle {\boldsymbol {v}}_{i}}
is found by Gram-Schmidt orthogonalizing
A
v
i
−
1
{\displaystyle {\boldsymbol {Av}}_{i-1}}
against
{
v
1
,
v
2
,
…
,
v
i
−
1
}
{\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},\ldots ,{\boldsymbol {v}}_{i-1}\}}
followed by normalization.
Put in matrix form, the iteration is captured by the equation
A
V
i
=
V
i
+
1
H
~
i
{\displaystyle {\boldsymbol {AV}}_{i}={\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}}
where
V
i
=
[
v
1
v
2
⋯
v
i
]
,
H
~
i
=
[
h
11
h
12
h
13
⋯
h
1
,
i
h
21
h
22
h
23
⋯
h
2
,
i
h
32
h
33
⋯
h
3
,
i
⋱
⋱
⋮
h
i
,
i
−
1
h
i
,
i
h
i
+
1
,
i
]
=
[
H
i
h
i
+
1
,
i
e
i
T
]
{\displaystyle {\begin{aligned}{\boldsymbol {V}}_{i}&={\begin{bmatrix}{\boldsymbol {v}}_{1}&{\boldsymbol {v}}_{2}&\cdots &{\boldsymbol {v}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {\tilde {H}}}_{i}&={\begin{bmatrix}h_{11}&h_{12}&h_{13}&\cdots &h_{1,i}\\h_{21}&h_{22}&h_{23}&\cdots &h_{2,i}\\&h_{32}&h_{33}&\cdots &h_{3,i}\\&&\ddots &\ddots &\vdots \\&&&h_{i,i-1}&h_{i,i}\\&&&&h_{i+1,i}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {H}}_{i}\\h_{i+1,i}{\boldsymbol {e}}_{i}^{\mathrm {T} }\end{bmatrix}}\end{aligned}}}
with
h
j
i
=
{
v
j
T
A
v
i
if
j
≤
i
,
‖
w
i
+
1
‖
2
if
j
=
i
+
1
,
0
if
j
>
i
+
1
.
{\displaystyle h_{ji}={\begin{cases}{\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i}&{\text{if }}j\leq i{\text{,}}\\\lVert {\boldsymbol {w}}_{i+1}\rVert _{2}&{\text{if }}j=i+1{\text{,}}\\0&{\text{if }}j>i+1{\text{.}}\end{cases}}}
When applying the Arnoldi iteration to solving linear systems, one starts with
r
0
=
b
−
A
x
0
{\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}}
, the residual corresponding to an initial guess
x
0
{\displaystyle {\boldsymbol {x}}_{0}}
. After each step of iteration, one computes
y
i
=
H
i
−
1
(
‖
r
0
‖
2
e
1
)
{\displaystyle {\boldsymbol {y}}_{i}={\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})}
and the new iterate
x
i
=
x
0
+
V
i
y
i
{\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}}
.
The direct Lanczos method
edit
For the rest of discussion, we assume that
A
{\displaystyle {\boldsymbol {A}}}
is symmetric positive-definite. With symmetry of
A
{\displaystyle {\boldsymbol {A}}}
, the upper Hessenberg matrix
H
i
=
V
i
T
A
V
i
{\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}}
becomes symmetric and thus tridiagonal. It then can be more clearly denoted by
H
i
=
[
a
1
b
2
b
2
a
2
b
3
⋱
⋱
⋱
b
i
−
1
a
i
−
1
b
i
b
i
a
i
]
.
{\displaystyle {\boldsymbol {H}}_{i}={\begin{bmatrix}a_{1}&b_{2}\\b_{2}&a_{2}&b_{3}\\&\ddots &\ddots &\ddots \\&&b_{i-1}&a_{i-1}&b_{i}\\&&&b_{i}&a_{i}\end{bmatrix}}{\text{.}}}
This enables a short three-term recurrence for
v
i
{\displaystyle {\boldsymbol {v}}_{i}}
in the iteration, and the Arnoldi iteration is reduced to the Lanczos iteration.
Since
A
{\displaystyle {\boldsymbol {A}}}
is symmetric positive-definite, so is
H
i
{\displaystyle {\boldsymbol {H}}_{i}}
. Hence,
H
i
{\displaystyle {\boldsymbol {H}}_{i}}
can be LU factorized without partial pivoting into
H
i
=
L
i
U
i
=
[
1
c
2
1
⋱
⋱
c
i
−
1
1
c
i
1
]
[
d
1
b
2
d
2
b
3
⋱
⋱
d
i
−
1
b
i
d
i
]
{\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}={\begin{bmatrix}1\\c_{2}&1\\&\ddots &\ddots \\&&c_{i-1}&1\\&&&c_{i}&1\end{bmatrix}}{\begin{bmatrix}d_{1}&b_{2}\\&d_{2}&b_{3}\\&&\ddots &\ddots \\&&&d_{i-1}&b_{i}\\&&&&d_{i}\end{bmatrix}}}
with convenient recurrences for
c
i
{\displaystyle c_{i}}
and
d
i
{\displaystyle d_{i}}
:
c
i
=
b
i
/
d
i
−
1
,
d
i
=
{
a
1
if
i
=
1
,
a
i
−
c
i
b
i
if
i
>
1
.
{\displaystyle {\begin{aligned}c_{i}&=b_{i}/d_{i-1}{\text{,}}\\d_{i}&={\begin{cases}a_{1}&{\text{if }}i=1{\text{,}}\\a_{i}-c_{i}b_{i}&{\text{if }}i>1{\text{.}}\end{cases}}\end{aligned}}}
Rewrite
x
i
=
x
0
+
V
i
y
i
{\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}}
as
x
i
=
x
0
+
V
i
H
i
−
1
(
‖
r
0
‖
2
e
1
)
=
x
0
+
V
i
U
i
−
1
L
i
−
1
(
‖
r
0
‖
2
e
1
)
=
x
0
+
P
i
z
i
{\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}\end{aligned}}}
with
P
i
=
V
i
U
i
−
1
,
z
i
=
L
i
−
1
(
‖
r
0
‖
2
e
1
)
.
{\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\text{,}}\\{\boldsymbol {z}}_{i}&={\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1}){\text{.}}\end{aligned}}}
It is now important to observe that
P
i
=
[
P
i
−
1
p
i
]
,
z
i
=
[
z
i
−
1
ζ
i
]
.
{\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\begin{bmatrix}{\boldsymbol {P}}_{i-1}&{\boldsymbol {p}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {z}}_{i}&={\begin{bmatrix}{\boldsymbol {z}}_{i-1}\\\zeta _{i}\end{bmatrix}}{\text{.}}\end{aligned}}}
In fact, there are short recurrences for
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
and
ζ
i
{\displaystyle \zeta _{i}}
as well:
p
i
=
1
d
i
(
v
i
−
b
i
p
i
−
1
)
,
ζ
i
=
−
c
i
ζ
i
−
1
.
{\displaystyle {\begin{aligned}{\boldsymbol {p}}_{i}&={\frac {1}{d_{i}}}({\boldsymbol {v}}_{i}-b_{i}{\boldsymbol {p}}_{i-1}){\text{,}}\\\zeta _{i}&=-c_{i}\zeta _{i-1}{\text{.}}\end{aligned}}}
With this formulation, we arrive at a simple recurrence for
x
i
{\displaystyle {\boldsymbol {x}}_{i}}
:
x
i
=
x
0
+
P
i
z
i
=
x
0
+
P
i
−
1
z
i
−
1
+
ζ
i
p
i
=
x
i
−
1
+
ζ
i
p
i
.
{\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i-1}{\boldsymbol {z}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}\\&={\boldsymbol {x}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}{\text{.}}\end{aligned}}}
The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.
The conjugate gradient method from imposing orthogonality and conjugacy
edit
If we allow
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
to scale and compensate for the scaling in the constant factor, we potentially can have simpler recurrences of the form:
x
i
=
x
i
−
1
+
α
i
−
1
p
i
−
1
,
r
i
=
r
i
−
1
−
α
i
−
1
A
p
i
−
1
,
p
i
=
r
i
+
β
i
−
1
p
i
−
1
.
{\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{i-1}+\alpha _{i-1}{\boldsymbol {p}}_{i-1}{\text{,}}\\{\boldsymbol {r}}_{i}&={\boldsymbol {r}}_{i-1}-\alpha _{i-1}{\boldsymbol {Ap}}_{i-1}{\text{,}}\\{\boldsymbol {p}}_{i}&={\boldsymbol {r}}_{i}+\beta _{i-1}{\boldsymbol {p}}_{i-1}{\text{.}}\end{aligned}}}
As premises for the simplification, we now derive the orthogonality of
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
and conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
, i.e., for
i
≠
j
{\displaystyle i\neq j}
,
r
i
T
r
j
=
0
,
p
i
T
A
p
j
=
0
.
{\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{j}&=0{\text{,}}\\{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}&=0{\text{.}}\end{aligned}}}
The residuals are mutually orthogonal because
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
is essentially a multiple of
v
i
+
1
{\displaystyle {\boldsymbol {v}}_{i+1}}
since for
i
=
0
{\displaystyle i=0}
,
r
0
=
‖
r
0
‖
2
v
1
{\displaystyle {\boldsymbol {r}}_{0}=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}}
, for
i
>
0
{\displaystyle i>0}
,
r
i
=
b
−
A
x
i
=
b
−
A
(
x
0
+
V
i
y
i
)
=
r
0
−
A
V
i
y
i
=
r
0
−
V
i
+
1
H
~
i
y
i
=
r
0
−
V
i
H
i
y
i
−
h
i
+
1
,
i
(
e
i
T
y
i
)
v
i
+
1
=
‖
r
0
‖
2
v
1
−
V
i
(
‖
r
0
‖
2
e
1
)
−
h
i
+
1
,
i
(
e
i
T
y
i
)
v
i
+
1
=
−
h
i
+
1
,
i
(
e
i
T
y
i
)
v
i
+
1
.
{\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}&={\boldsymbol {b}}-{\boldsymbol {Ax}}_{i}\\&={\boldsymbol {b}}-{\boldsymbol {A}}({\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i})\\&={\boldsymbol {r}}_{0}-{\boldsymbol {AV}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}{\boldsymbol {y}}_{i}-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}-{\boldsymbol {V}}_{i}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}{\text{.}}\end{aligned}}}
To see the conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
, it suffices to show that
P
i
T
A
P
i
{\displaystyle {\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}}
is diagonal:
P
i
T
A
P
i
=
U
i
−
T
V
i
T
A
V
i
U
i
−
1
=
U
i
−
T
H
i
U
i
−
1
=
U
i
−
T
L
i
U
i
U
i
−
1
=
U
i
−
T
L
i
{\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {H}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}\end{aligned}}}
is symmetric and lower triangular simultaneously and thus must be diagonal.
Now we can derive the constant factors
α
i
{\displaystyle \alpha _{i}}
and
β
i
{\displaystyle \beta _{i}}
with respect to the scaled
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
by solely imposing the orthogonality of
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
and conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
.
Due to the orthogonality of
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
, it is necessary that
r
i
+
1
T
r
i
=
(
r
i
−
α
i
A
p
i
)
T
r
i
=
0
{\displaystyle {\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i}=({\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i})^{\mathrm {T} }{\boldsymbol {r}}_{i}=0}
. As a result,
α
i
=
r
i
T
r
i
r
i
T
A
p
i
=
r
i
T
r
i
(
p
i
−
β
i
−
1
p
i
−
1
)
T
A
p
i
=
r
i
T
r
i
p
i
T
A
p
i
.
{\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{({\boldsymbol {p}}_{i}-\beta _{i-1}{\boldsymbol {p}}_{i-1})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{.}}\end{aligned}}}
Similarly, due to the conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
, it is necessary that
p
i
+
1
T
A
p
i
=
(
r
i
+
1
+
β
i
p
i
)
T
A
p
i
=
0
{\displaystyle {\boldsymbol {p}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=({\boldsymbol {r}}_{i+1}+\beta _{i}{\boldsymbol {p}}_{i})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=0}
. As a result,
β
i
=
−
r
i
+
1
T
A
p
i
p
i
T
A
p
i
=
−
r
i
+
1
T
(
r
i
−
r
i
+
1
)
α
i
p
i
T
A
p
i
=
r
i
+
1
T
r
i
+
1
r
i
T
r
i
.
{\displaystyle {\begin{aligned}\beta _{i}&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }({\boldsymbol {r}}_{i}-{\boldsymbol {r}}_{i+1})}{\alpha _{i}{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i+1}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}}{\text{.}}\end{aligned}}}
This completes the derivation.