Prev Next reverse_identity

An Important Reverse Mode Identity
The theorem and the proof below is a restatement of the results on page 236 of Evaluating Derivatives .

Notation
Given a function  f(u, v) where  u \in B^n we use the notation  \[
\D{f}{u} (u, v) = \left[ \D{f}{u_1} (u, v) , \cdots , \D{f}{u_n} (u, v) \right]
\] 


Reverse Sweep
When using reverse mode we are given a function  F : B^n \rightarrow B^m , a matrix of Taylor coefficients  x \in B^{n \times p} , and a weight vector  w \in B^m . We define the functions  X : B \times B^{n \times p} \rightarrow B^n ,  W : B \times B^{n \times p} \rightarrow B , and  W_j : B^{n \times p} \rightarrow B by  \[
\begin{array}{rcl}
     X(t , x) & = & x^{(0)} + x^{(1)} t + \cdots + x^{(p-1)} t^{p-1}
     \\
     W(t, x)   & = &  w_0 F_0 [X(t, x)] + \cdots + w_{m-1} F_{m-1} [X(t, x)]
     \\
     W_j (x)   & = & \frac{1}{j!} \Dpow{j}{t} W(0, x)
\end{array}
\]
where  x^{(j)} is the j-th column of  x \in B^{n \times p} . The theorem below implies that  \[
     \D{ W_j }{ x^{(i)} } (x) = \D{ W_{j-i} }{ x^{(0)} } (x) 
\] 
A general reverse sweep calculates the values  \[
     \D{ W_{p-1} }{ x^{(i)} } (x)  \hspace{1cm} (i = 0 , \ldots , p-1)
\] 
But the return values for a reverse sweep are specified in terms of the more useful values  \[
     \D{ W_j }{ x^{(0)} } (x)  \hspace{1cm} (j = 0 , \ldots , p-1)
\] 


Theorem
Suppose that  F : B^n \rightarrow B^m is a  p times continuously differentiable function. Define the functions  Z : B \times B^{n \times p} \rightarrow B^n ,  Y : B \times B^{n \times p }\rightarrow B^m , and  y^{(j)} : B^{n \times p }\rightarrow B^m by  \[
\begin{array}{rcl}
     Z(t, x)  & = & x^{(0)} + x^{(1)} t + \cdots + x^{(p-1)} t^{p-1}
     \\
     Y(t, x)  & = & F [ Z(t, x) ]
     \\
     y^{(j)} (x) & = & \frac{1}{j !} \Dpow{j}{t} Y(0, x) 
\end{array}
\] 
where  x^{(j)} denotes the j-th column of  x \in B^{n \times p} . It follows that for all  i, j such that  i \leq j < p ,  \[
\begin{array}{rcl}
\D{ y^{(j)} }{ x^{(i)} } (x) & = & \D{ y^{(j-i)} }{ x^{(0)} } (x)
\end{array}
\] 


Proof
If follows from the definitions that  \[
\begin{array}{rclr}
\D{ y^{(j)} }{ x^{(i)} } (x)
& = & 
\frac{1}{j ! } \D{ }{ x^{(i)} } 
     \left[ \Dpow{j}{t} (F \circ  Z) (t, x)  \right]_{t=0}
\\
& = &
\frac{1}{j ! } \left[ \Dpow{j}{t} 
     \D{ }{ x^{(i)} } (F \circ  Z) (t, x) 
\right]_{t=0}
\\
& = &
\frac{1}{j ! } \left\{ 
     \Dpow{j}{t} \left[ t^i ( F^{(1)} \circ Z ) (t, x) \right] 
\right\}_{t=0}
\end{array}
\] 
For  k > i , the k-th partial of  t^i with respect to  t is zero. Thus, the partial with respect to  t is given by  \[
\begin{array}{rcl}
\Dpow{j}{t} \left[ t^i ( F^{(1)} \circ Z ) (t, x) \right] 
& = &
\sum_{k=0}^i 
\left( \begin{array}{c} j \\ k \end{array} \right)
\frac{ i ! }{ (i - k) ! } t^{i-k} \; 
\Dpow{j-k}{t} ( F^{(1)} \circ Z ) (t, x)
\\
\left\{ 
     \Dpow{j}{t} \left[ t^i ( F^{(1)} \circ Z ) (t, x) \right] 
\right\}_{t=0}
& = &
\left( \begin{array}{c} j \\ i \end{array} \right)
i ! \Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x)
\\
& = &
\frac{ j ! }{ (j - i) ! }
\Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x)
\\
\D{ y^{(j)} }{ x^{(i)} } (x)
& = & 
\frac{ 1 }{ (j - i) ! }
\Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x)
\end{array}
\] 
Applying this formula to the case where  j is replaced by  j - i and  i is replaced by zero, we obtain  \[
\D{ y^{(j-i)} }{ x^{(0)} } (x)
=
\frac{ 1 }{ (j - i) ! }
\Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x)
=
\D{ y^{(j)} }{ x^{(i)} } (x)
\] 
which completes the proof
Input File: omh/theory/reverse_identity.omh