An Important Reverse Mode Identity

reverse_identity

@(@\newcommand{\W}[1]{ \; #1 \; } \newcommand{\R}[1]{ {\rm #1} } \newcommand{\B}[1]{ {\bf #1} } \newcommand{\D}[2]{ \frac{\partial #1}{\partial #2} } \newcommand{\DD}[3]{ \frac{\partial^2 #1}{\partial #2 \partial #3} } \newcommand{\Dpow}[2]{ \frac{\partial^{#1}}{\partial {#2}^{#1}} } \newcommand{\dpow}[2]{ \frac{ {\rm d}^{#1}}{{\rm d}\, {#2}^{#1}} }@)@ An Important Reverse Mode Identity The theorem and the proof below is a restatement of the results on page 236 of Evaluating Derivatives .

Notation
Given a function @(@ f(u, v) @)@ where @(@ u \in B^n @)@ we use the notation @[@ \D{f}{u} (u, v) = \left[ \D{f}{u_1} (u, v) , \cdots , \D{f}{u_n} (u, v) \right] @]@

Reverse Sweep
When using reverse mode we are given a function @(@ F : B^n \rightarrow B^m @)@, a matrix of Taylor coefficients @(@ x \in B^{n \times p} @)@, and a weight vector @(@ w \in B^m @)@. We define the functions @(@ X : B \times B^{n \times p} \rightarrow B^n @)@, @(@ W : B \times B^{n \times p} \rightarrow B @)@, and @(@ W_j : B^{n \times p} \rightarrow B @)@ by @[@ \begin{array}{rcl} X(t , x) & = & x^{(0)} + x^{(1)} t + \cdots + x^{(p-1)} t^{p-1} \\ W(t, x) & = & w_0 F_0 [X(t, x)] + \cdots + w_{m-1} F_{m-1} [X(t, x)] \\ W_j (x) & = & \frac{1}{j!} \Dpow{j}{t} W(0, x) \end{array} @]@ where @(@ x^{(j)} @)@ is the j-th column of @(@ x \in B^{n \times p} @)@. The theorem below implies that @[@ \D{ W_j }{ x^{(i)} } (x) = \D{ W_{j-i} }{ x^{(0)} } (x) @]@ A general reverse sweep calculates the values @[@ \D{ W_{p-1} }{ x^{(i)} } (x) \hspace{1cm} (i = 0 , \ldots , p-1) @]@ But the return values for a reverse sweep are specified in terms of the more useful values @[@ \D{ W_j }{ x^{(0)} } (x) \hspace{1cm} (j = 0 , \ldots , p-1) @]@

Theorem
Suppose that @(@ F : B^n \rightarrow B^m @)@ is a @(@ p @)@ times continuously differentiable function. Define the functions @(@ Z : B \times B^{n \times p} \rightarrow B^n @)@, @(@ Y : B \times B^{n \times p }\rightarrow B^m @)@, and @(@ y^{(j)} : B^{n \times p }\rightarrow B^m @)@ by @[@ \begin{array}{rcl} Z(t, x) & = & x^{(0)} + x^{(1)} t + \cdots + x^{(p-1)} t^{p-1} \\ Y(t, x) & = & F [ Z(t, x) ] \\ y^{(j)} (x) & = & \frac{1}{j !} \Dpow{j}{t} Y(0, x) \end{array} @]@ where @(@ x^{(j)} @)@ denotes the j-th column of @(@ x \in B^{n \times p} @)@. It follows that for all @(@ i, j @)@ such that @(@ i \leq j < p @)@, @[@ \begin{array}{rcl} \D{ y^{(j)} }{ x^{(i)} } (x) & = & \D{ y^{(j-i)} }{ x^{(0)} } (x) \end{array} @]@

Proof
If follows from the definitions that @[@ \begin{array}{rclr} \D{ y^{(j)} }{ x^{(i)} } (x) & = & \frac{1}{j ! } \D{ }{ x^{(i)} } \left[ \Dpow{j}{t} (F \circ Z) (t, x) \right]_{t=0} \\ & = & \frac{1}{j ! } \left[ \Dpow{j}{t} \D{ }{ x^{(i)} } (F \circ Z) (t, x) \right]_{t=0} \\ & = & \frac{1}{j ! } \left\{ \Dpow{j}{t} \left[ t^i ( F^{(1)} \circ Z ) (t, x) \right] \right\}_{t=0} \end{array} @]@ For @(@ k > i @)@, the k-th partial of @(@ t^i @)@ with respect to @(@ t @)@ is zero. Thus, the partial with respect to @(@ t @)@ is given by @[@ \begin{array}{rcl} \Dpow{j}{t} \left[ t^i ( F^{(1)} \circ Z ) (t, x) \right] & = & \sum_{k=0}^i \left( \begin{array}{c} j \\ k \end{array} \right) \frac{ i ! }{ (i - k) ! } t^{i-k} \; \Dpow{j-k}{t} ( F^{(1)} \circ Z ) (t, x) \\ \left\{ \Dpow{j}{t} \left[ t^i ( F^{(1)} \circ Z ) (t, x) \right] \right\}_{t=0} & = & \left( \begin{array}{c} j \\ i \end{array} \right) i ! \Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x) \\ & = & \frac{ j ! }{ (j - i) ! } \Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x) \\ \D{ y^{(j)} }{ x^{(i)} } (x) & = & \frac{ 1 }{ (j - i) ! } \Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x) \end{array} @]@ Applying this formula to the case where @(@ j @)@ is replaced by @(@ j - i @)@ and @(@ i @)@ is replaced by zero, we obtain @[@ \D{ y^{(j-i)} }{ x^{(0)} } (x) = \frac{ 1 }{ (j - i) ! } \Dpow{j-i}{t} ( F^{(1)} \circ Z ) (t, x) = \D{ y^{(j)} }{ x^{(i)} } (x) @]@ which completes the proof

Input File: omh/appendix/theory/reverse_identity.omh