exp_eps: Second Order Reverse Sweep

@(@\newcommand{\W}[1]{ \; #1 \; } \newcommand{\R}[1]{ {\rm #1} } \newcommand{\B}[1]{ {\bf #1} } \newcommand{\D}[2]{ \frac{\partial #1}{\partial #2} } \newcommand{\DD}[3]{ \frac{\partial^2 #1}{\partial #2 \partial #3} } \newcommand{\Dpow}[2]{ \frac{\partial^{#1}}{\partial {#2}^{#1}} } \newcommand{\dpow}[2]{ \frac{ {\rm d}^{#1}}{{\rm d}\, {#2}^{#1}} }@)@ exp_eps: Second Order Reverse Sweep
Purpose
In general, a second order reverse sweep is given the first order expansion for all of the variables in an operation sequence. Given a choice of a particular variable, it computes the derivative, of that variables first order expansion coefficient, with respect to all of the independent variables.

Mathematical Form
Suppose that we use the algorithm exp_eps.hpp to compute exp_eps(x, epsilon) with x is equal to .5 and epsilon is equal to .2. For this case, the mathematical function for the operation sequence corresponding to exp_eps is @[@ f ( x , \varepsilon ) = 1 + x + x^2 / 2 @]@ The corresponding derivative of the partial derivative with respect to @(@ x @)@ is @[@ \begin{array}{rcl} \Dpow{2}{x} f ( x , \varepsilon ) & = & 1 \\ \partial_\varepsilon \partial_x f ( x , \varepsilon ) & = & 0 \end{array} @]@

epsilon
Since @(@ \varepsilon @)@ is an independent variable, it could included as an argument to all of the @(@ f_j @)@ functions below. The result would be that all the partials with respect to @(@ \varepsilon @)@ would be zero and hence we drop it to simplify the presentation.

f_7
In reverse mode we choose one dependent variable and compute its derivative with respect to all the independent variables. For our example, we chose the value returned by exp_eps.hpp which is @(@ v_7 @)@. We begin with the function @(@ f_7 @)@ where @(@ v_7 @)@ is both an argument and the value of the function; i.e., @[@ \begin{array}{rcl} f_7 \left( v_1^{(0)} , v_1^{(1)} , \ldots , v_7^{(0)} , v_7^{(1)} \right) & = & v_7^{(1)} \\ \D{f_7}{v_7^{(1)}} & = & 1 \end{array} @]@ All the other partial derivatives of @(@ f_7 @)@ are zero.

Index 7: f_6
The last operation has index 7, @[@ \begin{array}{rcl} v_7^{(0)} & = & v_4^{(0)} + v_6^{(0)} \\ v_7^{(1)} & = & v_4^{(1)} + v_6^{(1)} \end{array} @]@ We define the function @(@ f_6 \left( v_1^{(0)} , \ldots , v_6^{(1)} \right) @)@ as equal to @(@ f_7 @)@ except that @(@ v_7^{(0)} @)@ and @(@ v_7^{(1)} @)@ are eliminated using this operation; i.e. @[@ f_6 = f_7 \left[ v_1^{(0)} , \ldots , v_6^{(1)} , v_7^{(0)} \left( v_4^{(0)} , v_6^{(0)} \right) , v_7^{(1)} \left( v_4^{(1)} , v_6^{(1)} \right) \right] @]@ It follows that @[@ \begin{array}{rcll} \D{f_6}{v_4^{(1)}} & = & \D{f_7}{v_4^{(1)}} + \D{f_7}{v_7^{(1)}} * \D{v_7^{(1)}}{v_4^{(1)}} & = 1 \\ \D{f_6}{v_6^{(1)}} & = & \D{f_7}{v_6^{(1)}} + \D{f_7}{v_7^{(1)}} * \D{v_7^{(1)}}{v_6^{(1)}} & = 1 \end{array} @]@ All the other partial derivatives of @(@ f_6 @)@ are zero.

Index 6: f_5
The previous operation has index 6, @[@ \begin{array}{rcl} v_6^{(0)} & = & v_5^{(0)} / 2 \\ v_6^{(1)} & = & v_5^{(1)} / 2 \end{array} @]@ We define the function @(@ f_5 \left( v_1^{(0)} , \ldots , v_5^{(1)} \right) @)@ as equal to @(@ f_6 @)@ except that @(@ v_6^{(0)} @)@ and @(@ v_6^{(1)} @)@ are eliminated using this operation; i.e. @[@ f_5 = f_6 \left[ v_1^{(0)} , \ldots , v_5^{(1)} , v_6^{(0)} \left( v_5^{(0)} \right) , v_6^{(1)} \left( v_5^{(1)} \right) \right] @]@ It follows that @[@ \begin{array}{rcll} \D{f_5}{v_4^{(1)}} & = & \D{f_6}{v_4^{(1)}} & = 1 \\ \D{f_5}{v_5^{(1)}} & = & \D{f_6}{v_5} + \D{f_6}{v_6^{(1)}} * \D{v_6^{(1)}}{v_5^{(1)}} & = 0.5 \end{array} @]@ All the other partial derivatives of @(@ f_5 @)@ are zero.

Index 5: f_4
The previous operation has index 5, @[@ \begin{array}{rcl} v_5^{(0)} & = & v_3^{(0)} * v_1^{(0)} \\ v_5^{(1)} & = & v_3^{(1)} * v_1^{(0)} + v_3^{(0)} * v_1^{(1)} \end{array} @]@ We define the function @(@ f_4 \left( v_1^{(0)} , \ldots , v_4^{(1)} \right) @)@ as equal to @(@ f_5 @)@ except that @(@ v_5^{(0)} @)@ and @(@ v_5^{(1)} @)@ are eliminated using this operation; i.e. @[@ f_4 = f_5 \left[ v_1^{(0)} , \ldots , v_4^{(1)} , v_5^{(0)} \left( v_1^{(0)}, v_3^{(0)} \right) , v_5^{(1)} \left( v_1^{(0)}, v_1^{(1)}, v_3^{(0)} , v_3^{(1)} \right) , \right] @]@ Given the information from the forward sweep, we have @(@ v_1^{(0)} = 0.5 @)@, @(@ v_3^{(0)} = 0.5 @)@, @(@ v_1^{(1)} = 1 @)@, @(@ v_3^{(1)} = 1 @)@, and the fact that the partial of @(@ f_5 @)@ with respect to @(@ v_5^{(0)} @)@ is zero, we have @[@ \begin{array}{rcll} \D{f_4}{v_1^{(0)}} & = & \D{f_5}{v_1^{(0)}} + \D{f_5}{v_5^{(1)}} * \D{v_5^{(1)}}{v_1^{(0)}} & = 0.5 \\ \D{f_4}{v_1^{(1)}} & = & \D{f_5}{v_1^{(1)}} + \D{f_5}{v_5^{(1)}} * \D{v_5^{(1)}}{v_1^{(1)}} & = 0.25 \\ \D{f_4}{v_3^{(0)}} & = & \D{f_5}{v_3^{(0)}} + \D{f_5}{v_5^{(1)}} * \D{v_5^{(1)}}{v_3^{(0)}} & = 0.5 \\ \D{f_4}{v_3^{(1)}} & = & \D{f_3}{v_1^{(1)}} + \D{f_5}{v_5^{(1)}} * \D{v_5^{(1)}}{v_3^{(1)}} & = 0.25 \\ \D{f_4}{v_4^{(1)}} & = & \D{f_5}{v_4^{(1)}} & = 1 \end{array} @]@ All the other partial derivatives of @(@ f_5 @)@ are zero.

Index 4: f_3
The previous operation has index 4, @[@ \begin{array}{rcl} v_4^{(0)} = 1 + v_3^{(0)} \\ v_4^{(1)} = v_3^{(1)} \end{array} @]@ We define the function @(@ f_3 \left( v_1^{(0)} , \ldots , v_3^{(1)} \right) @)@ as equal to @(@ f_4 @)@ except that @(@ v_4^{(0)} @)@ and @(@ v_4^{(1)} @)@ are eliminated using this operation; i.e. @[@ f_3 = f_4 \left[ v_1^{(0)} , \ldots , v_3^{(1)} , v_4^{(0)} \left( v_3^{(0)} \right) , v_4^{(1)} \left( v_3^{(1)} \right) \right] @]@ It follows that @[@ \begin{array}{rcll} \D{f_3}{v_1^{(0)}} & = & \D{f_4}{v_1^{(0)}} & = 0.5 \\ \D{f_3}{v_1^{(1)}} & = & \D{f_4}{v_1^{(1)}} & = 0.25 \\ \D{f_3}{v_2^{(0)}} & = & \D{f_4}{v_2^{(0)}} & = 0 \\ \D{f_3}{v_2^{(1)}} & = & \D{f_4}{v_2^{(1)}} & = 0 \\ \D{f_3}{v_3^{(0)}} & = & \D{f_4}{v_3^{(0)}} + \D{f_4}{v_4^{(0)}} * \D{v_4^{(0)}}{v_3^{(0)}} & = 0.5 \\ \D{f_3}{v_3^{(1)}} & = & \D{f_4}{v_3^{(1)}} + \D{f_4}{v_4^{(1)}} * \D{v_4^{(1)}}{v_3^{(1)}} & = 1.25 \end{array} @]@

Index 3: f_2
The previous operation has index 3, @[@ \begin{array}{rcl} v_3^{(0)} & = & v_2^{(0)} / 1 \\ v_3^{(1)} & = & v_2^{(1)} / 1 \end{array} @]@ We define the function @(@ f_2 \left( v_1^{(0)} , \ldots , v_2^{(1)} \right) @)@ as equal to @(@ f_3 @)@ except that @(@ v_3^{(0)} @)@ and @(@ v_3^{(1)} @)@ are eliminated using this operation; i.e. @[@ f_2 = f_3 \left[ v_1^{(0)} , \ldots , v_2^{(1)} , v_3^{(0)} \left( v_2^{(0)} \right) , v_3^{(1)} \left( v_2^{(1)} \right) \right] @]@ It follows that @[@ \begin{array}{rcll} \D{f_2}{v_1^{(0)}} & = & \D{f_3}{v_1^{(0)}} & = 0.5 \\ \D{f_2}{v_1^{(1)}} & = & \D{f_3}{v_1^{(1)}} & = 0.25 \\ \D{f_2}{v_2^{(0)}} & = & \D{f_3}{v_2^{(0)}} + \D{f_3}{v_3^{(0)}} * \D{v_3^{(0)}}{v_2^{(0)}} & = 0.5 \\ \D{f_2}{v_2^{(1)}} & = & \D{f_3}{v_2^{(1)}} + \D{f_3}{v_3^{(1)}} * \D{v_3^{(1)}}{v_2^{(0)}} & = 1.25 \end{array} @]@

Index 2: f_1
The previous operation has index 1, @[@ \begin{array}{rcl} v_2^{(0)} & = & 1 * v_1^{(0)} \\ v_2^{(1)} & = & 1 * v_1^{(1)} \end{array} @]@ We define the function @(@ f_1 \left( v_1^{(0)} , v_1^{(1)} \right) @)@ as equal to @(@ f_2 @)@ except that @(@ v_2^{(0)} @)@ and @(@ v_2^{(1)} @)@ are eliminated using this operation; i.e. @[@ f_1 = f_2 \left[ v_1^{(0)} , v_1^{(1)} , v_2^{(0)} \left( v_1^{(0)} \right) , v_2^{(1)} \left( v_1^{(1)} \right) \right] @]@ It follows that @[@ \begin{array}{rcll} \D{f_1}{v_1^{(0)}} & = & \D{f_2}{v_1^{(0)}} + \D{f_2}{v_2^{(0)}} * \D{v_2^{(0)}}{v_1^{(0)}} & = 1 \\ \D{f_1}{v_1^{(1)}} & = & \D{f_2}{v_1^{(1)}} + \D{f_2}{v_2^{(1)}} * \D{v_2^{(1)}}{v_1^{(1)}} & = 1.5 \end{array} @]@ Note that @(@ v_1 @)@ is equal to @(@ x @)@, so the second partial derivative of exp_eps(x, epsilon) at x equal to .5 and epsilon equal .2 is @[@ \Dpow{2}{x} v_7^{(0)} = \D{v_7^{(1)}}{x} = \D{f_1}{v_1^{(0)}} = 1 @]@ There is a theorem about algorithmic differentiation that explains why the other partial of @(@ f_1 @)@ is equal to the first partial of exp_eps(x, epsilon) with respect to @(@ x @)@.

Verification
The file exp_eps_rev2.cpp contains a routine that verifies the values computed above. It returns true for success and false for failure. It only tests the partial derivatives of @(@ f_j @)@ that might not be equal to the corresponding partials of @(@ f_{j+1} @)@; i.e., the other partials of @(@ f_j @)@ must be equal to the corresponding partials of @(@ f_{j+1} @)@.

Exercises

Consider the case where @(@ x = .1 @)@ and we first preform a zero order forward mode sweep for the operation sequence used above (in reverse order). What are the results of a first order reverse mode sweep; i.e., what are the corresponding values for @(@ \D{f_j}{v_k} @)@ for all @(@ j, k @)@ such that @(@ \D{f_j}{v_k} \neq 0 @)@.
Create a modified version of exp_eps_rev2.cpp that verifies the values you obtained for the previous exercise. Also create and run a main program that reports the result of calling the modified version of exp_eps_rev2.cpp .

Input File: introduction/exp_eps.omh