$\newcommand{\W}[1]{ \; #1 \; } \newcommand{\R}[1]{ {\rm #1} } \newcommand{\B}[1]{ {\bf #1} } \newcommand{\D}[2]{ \frac{\partial #1}{\partial #2} } \newcommand{\DD}[3]{ \frac{\partial^2 #1}{\partial #2 \partial #3} } \newcommand{\Dpow}[2]{ \frac{\partial^{#1}}{\partial {#2}^{#1}} } \newcommand{\dpow}[2]{ \frac{ {\rm d}^{#1}}{{\rm d}\, {#2}^{#1}} }$
.

Syntax
ok = multi_atomic_setup(y_squared)

Purpose
This routine splits up the computation into the individual threads.

It is assumed that this function is called by thread zero and all the other threads are blocked (waiting).

y_squared
This argument has prototype       const vector<double>& y_squared  and its size is equal to the number of equations to solve. It is the values that we are computing the square root of.

ok
This return value has prototype       bool ok  If it is false, multi_atomic_setup detected an error.

Source

namespace {
bool multi_atomic_setup(const vector<double>& y_squared)
//
// declare independent variable variable vector
ax[0] = 2.0;
//
// argument and result for atomic function
au[0] = AD<double>( num_itr_ ); // num_itr
au[1] = ax[0];                  // y_initial
au[2] = ax[0];                  // y_squared
// put user atomic operation in recording
(*a_square_root_)(au, ay);
//
// f(u) = sqrt(u)
//
// number of square roots for each thread
size_t y_index    = 0;
//
{     // allocate separate memory for each thread to avoid false sharing
size_t min_bytes(sizeof(work_one_t)), cap_bytes;
//
//
// Run constructor on work_all_[thread_num] vectors
//
// Each worker gets a separate copy of fun. This is necessary because
// the Taylor coefficients will be set by each thread.
//
// values we are computing square root of for this thread
for(size_t i = 0; i < per_thread; i++)
if( y_index < y_squared.size() )
}