Prev Next Index-> contents reference index search external Up-> CppAD AD ADValued atomic atomic_base atomic_eigen_cholesky.cpp cholesky_theory atomic-> checkpoint atomic_base atomic_base-> atomic_ctor atomic_option atomic_afun atomic_forward atomic_reverse atomic_for_sparse_jac atomic_rev_sparse_jac atomic_for_sparse_hes atomic_rev_sparse_hes atomic_base_clear atomic_get_started.cpp atomic_norm_sq.cpp atomic_reciprocal.cpp atomic_set_sparsity.cpp atomic_tangent.cpp atomic_eigen_mat_mul.cpp atomic_eigen_mat_inv.cpp atomic_eigen_cholesky.cpp atomic_mat_mul.cpp atomic_eigen_cholesky.cpp-> cholesky_theory atomic_eigen_cholesky.hpp cholesky_theory Headings-> Reference Notation ---..Cholesky Factor ---..Taylor Coefficient ---..Lower Triangular Part Forward Mode Lemma 1 ---..Proof Lemma 2 Reverse Mode ---..Case k = 0 ---..Case k > 0

AD Theory for Cholesky Factorization

Reference
See section 3.6 of Sebastian F. Walter's Ph.D. thesis, Structured Higher-Order Algorithmic Differentiation in the Forward and Reverse Mode with Application in Optimum Experimental Design , Humboldt-Universitat zu Berlin, 2011.

Notation

Cholesky Factor
We are given a positive definite symmetric matrix $A \in \B{R}^{n \times n}$ and a Cholesky factorization $$A = L L^\R{T}$$ where $L \in \B{R}^{n \times n}$ is lower triangular.

Taylor Coefficient
The matrix $A$ is a function of a scalar argument $t$. For $k = 0 , \ldots , K$, we use $A_k$ for the corresponding Taylor coefficients; i.e., $$A(t) = o( t^K ) + \sum_{k = 0}^K A_k t^k$$ where $o( t^K ) / t^K \rightarrow 0$ as $t \rightarrow 0$. We use a similar notation for $L(t)$.

Lower Triangular Part
For a square matrix $C$, $\R{lower} (C)$ is the lower triangular part of $C$, $\R{diag} (C)$ is the diagonal matrix with the same diagonal as $C$ and $$\R{low} ( C ) = \R{lower} (C) - \frac{1}{2} \R{diag} (C)$$

Forward Mode
For Taylor coefficient order $k = 0 , \ldots , K$ the coefficients $A_k \in \B{R}^{n \times n}$, and satisfy the equation $$A_k = \sum_{\ell=0}^k L_\ell L_{k-\ell}^\R{T}$$ In the case where $k=0$, the $$A_0 = L_0 L_0^\R{T}$$ The value of $L_0$ can be computed using the Cholesky factorization. In the case where $k > 0$, $$A_k = L_k L_0^\R{T} + L_0 L_k^\R{T} + B_k$$ where $$B_k = \sum_{\ell=1}^{k-1} L_\ell L_{k-\ell}^\R{T}$$ Note that $B_k$ is defined in terms of Taylor coefficients of $L(t)$ that have order less than $k$. We also note that $$L_0^{-1} ( A_k - B_k ) L_0^\R{-T} = L_0^{-1} L_k + L_k^\R{T} L_0^\R{-T}$$ The first matrix on the right hand side is lower triangular, the second is upper triangular, and the diagonals are equal. It follows that $$L_0^{-1} L_k = \R{low} [ L_0^{-1} ( A_k - B_k ) L_0^\R{-T} ]$$ $$L_k = L_0 \R{low} [ L_0^{-1} ( A_k - B_k ) L_0^\R{-T} ]$$ This expresses $L_k$ in term of the Taylor coefficients of $A(t)$ and the lower order coefficients of $L(t)$.

Lemma 1
We use the notation $\dot{C}$ for the derivative of a matrix valued function $C(s)$ with respect to a scalar argument $s$. We use the notation $\bar{S}$ and $\bar{L}$ for the partial derivative of a scalar value function $\bar{F}( S, L)$ with respect to a symmetric matrix $S$ and an lower triangular matrix $L$. Define the scalar valued function $$\hat{F}( C ) = \bar{F} [ S , \hat{L} (S) ]$$ We use $\hat{S}$ for the total derivative of $\hat{F}$ with respect to $S$. Suppose that $\hat{L} ( S )$ is such that $$\dot{L} = L_0 \R{low} ( L_0^{-1} \dot{S} L_0^\R{-T} )$$ for any $S(s)$. It follows that $$\hat{S} = \bar{S} + \frac{1}{2} ( M + M^\R{T} )$$ where $$M = L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} )^\R{T} L_0^{-1}$$

Proof
$$\partial_s \hat{F} [ S(s) , L(s) ] = \R{tr} ( \bar{S}^\R{T} \dot{S} ) + \R{tr} ( \bar{L}^\R{T} \dot{L} )$$$$\R{tr} ( \bar{L}^\R{T} \dot{L} ) = \R{tr} [ \bar{L}^\R{T} L_0 \R{low} ( L_0^{-1} \dot{S} L_0^\R{-T} ) ]$$$$= \R{tr} [ \R{low} ( L_0^{-1} \dot{S} L_0^\R{-T} )^\R{T} L_0^\R{T} \bar{L} ]$$$$= \R{tr} [ L_0^{-1} \dot{S} L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} ) ]$$$$= \R{tr} [ L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} ) L_0^{-1} \dot{S} ]$$$$\partial_s \hat{F} [ S(s) , L(s) ] = \R{tr} ( \bar{S}^\R{T} \dot{S} ) + \R{tr} [ L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} ) L_0^{-1} \dot{S} ]$$We now consider the $(i, j)$ component function, for a symmetric matrix $S(s)$, defined by $$S_{k, \ell} (s) = \left\{ \begin{array}{ll} 1 & \R{if} \; k = i \; \R{and} \; \ell = j \\ 1 & \R{if} \; k = j \; \R{and} \; \ell = i \\ 0 & \R{otherwise} \end{array} \right\}$$ This shows that the formula in the lemma is correct for $\hat{S}_{i,j}$ and $\hat{S}_{j,i}$. This completes the proof because the component $(i, j)$ was arbitrary.

Lemma 2
We use the same assumptions as in Lemma 1 except that the matrix $S$ is lower triangular (instead of symmetric). It follows that $$\hat{S} = \bar{S} + \R{lower}(M)$$ where $$M = L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} )^\R{T} L_0^{-1}$$ The proof of this lemma is identical to Lemma 2 except that component function is defined by $$S_{k, \ell} (s) = \left\{ \begin{array}{ll} 1 & \R{if} \; k = i \; \R{and} \; \ell = j \\ 0 & \R{otherwise} \end{array} \right\}$$

Reverse Mode

Case k = 0
For the case $k = 0$, $$\dot{A}_0 = \dot{L}_0 L_0^\R{T} + L_0 \dot{L}_0^\R{T}$$ $$L_0^{-1} \dot{A}_0 L_0^\R{-T} = L_0^{-1} \dot{L}_0 + \dot{L}_0^\R{T} L_0^\R{-T}$$ $$\R{low} ( L_0^{-1} \dot{A}_0 L_0^\R{-T} ) = L_0^{-1} \dot{L}_0$$ $$\dot{L}_0 = L_0 \R{low} ( L_0^{-1} \dot{A}_0 L_0^\R{-T} )$$ It follows from Lemma 1 that $$\bar{A}_0 \stackrel{+}{=} \frac{1}{2} ( M + M^\R{T} )$$ where $$M = L_0^\R{-T} \R{low} ( L_0^\R{T} \bar{L}_0 )^\R{T} L_0^{-1}$$ and $\bar{A}_0$ is the partial before and after is before and after $L_0$ is removed from the scalar function dependency.

Case k > 0
In the case where $k > 0$, $$A_k = L_k L_0^\R{T} + L_0 L_k^\R{T} + B_k$$ where $B_k$ is defined in terms of Taylor coefficients of $L(t)$ that have order less than $k$. It follows that $$\dot{L}_k L_0^\R{T} + L_0 \dot{L}_k^\R{T} = \dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T}$$ $$L_0^{-1} \dot{L}_k + \dot{L}_k^\R{T} L_0^\R{-T} = L_0^{-1} ( \dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T} ) L_0^\R{-T}$$ $$L_0^{-1} \dot{L}_k = \R{low} [ L_0^{-1} ( \dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T} ) L_0^\R{-T} ]$$ $$\dot{L}_k = L_0 \R{low} [ L_0^{-1} ( \dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T} ) L_0^\R{-T} ]$$ The matrix $A_k$ is symmetric, it follows that $$\bar{A}_k \stackrel{+}{=} \frac{1}{2} ( M_k + M_k^\R{T} )$$ where $$M_k = L_0^\R{-T} \R{low} ( L_0^\R{T} \bar{L}_k )^\R{T} L_0^{-1}$$ The matrix $B_k$ is also symmetric, hence $$\bar{B}_k = - \; \frac{1}{2} ( M_k + M_k^\R{T} )$$ We define the symmetric matrix $C_k (s)$ by $$\dot{C}_k = \dot{L}_0 L_k^\R{T} + L_k \dot{L}_0^\R{T}$$ and remove the dependency on $C_k$ with $$\R{tr}( \bar{C}_k^\R{T} \dot{C}_k ) = \R{tr}( \bar{B}_k^\R{T} \dot{C}_k ) = \R{tr}( \bar{B}_k^\R{T} \dot{L}_0 L_k^\R{T} ) + \R{tr}( \bar{B}_k^\R{T} L_k \dot{L}_0^\R{T} )$$ $$= \R{tr}( L_k^\R{T} \bar{B}_k^\R{T} \dot{L}_0 ) + \R{tr}( L_k^\R{T} \bar{B}_k \dot{L}_0 )$$ $$= \R{tr}[ L_k^\R{T} ( \bar{B}_k + \bar{B}_k^\R{T} ) \dot{L}_0 ]$$ Thus, removing $C_k$ from the dependency results in the following update to $\bar{L}_0$: $$\bar{L}_0 \stackrel{+}{=} \R{lower} [ ( \bar{B}_k + \bar{B}_k^\R{T} ) L_k ]$$ which is the same as $$\bar{L}_0 \stackrel{+}{=} 2 \; \R{lower} [ \bar{B}_k L_k ]$$ We still need to remove $B_k$ from the dependency. It follows from its definition that $$\dot{B}_k = \sum_{\ell=1}^{k-1} \dot{L}_\ell L_{k-\ell}^\R{T} + L_\ell \dot{L}_{k-\ell}^\R{T}$$ $$\R{tr}( \bar{B}_k^\R{T} \dot{B}_k ) = \sum_{\ell=1}^{k-1} \R{tr}( \bar{B}_k^\R{T} \dot{L}_\ell L_{k-\ell}^\R{T} ) + \R{tr}( \bar{B}_k^\R{T} L_\ell \dot{L}_{k-\ell}^\R{T} )$$ $$= \sum_{\ell=1}^{k-1} \R{tr}( L_{k-\ell}^\R{T} \bar{B}_k^\R{T} \dot{L}_\ell ) + \sum_{\ell=1}^{k-1} \R{tr}( L_\ell^\R{T} \bar{B}_k \dot{L}_{k-\ell} )$$ We now use the fact that $\bar{B}_k$ is symmetric to conclude $$\R{tr}( \bar{B}_k^\R{T} \dot{B}_k ) = 2 \sum_{\ell=1}^{k-1} \R{tr}( L_{k-\ell}^\R{T} \bar{B}_k^\R{T} \dot{L}_\ell )$$ Each of the $\dot{L}_\ell$ matrices is lower triangular. Thus, removing $B_k$ from the dependency results in the following update for $\ell = 1 , \ldots , k-1$: $$\bar{L}_\ell \stackrel{+}{=} 2 \; \R{lower}( \bar{B}_k L_{k-\ell} )$$
Input File: omh/appendix/theory/cholesky.omh