|
Packit |
ea1746 |
.. default-domain:: cpp
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. cpp:namespace:: ceres
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. _chapter-automatic_derivatives:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
=====================
|
|
Packit |
ea1746 |
Automatic Derivatives
|
|
Packit |
ea1746 |
=====================
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
We will now consider automatic differentiation. It is a technique that
|
|
Packit |
ea1746 |
can compute exact derivatives, fast, while requiring about the same
|
|
Packit |
ea1746 |
effort from the user as is needed to use numerical differentiation.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Don't believe me? Well here goes. The following code fragment
|
|
Packit |
ea1746 |
implements an automatically differentiated ``CostFunction`` for `Rat43
|
|
Packit |
ea1746 |
<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
struct Rat43CostFunctor {
|
|
Packit |
ea1746 |
Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template <typename T>
|
|
Packit |
ea1746 |
bool operator()(const T* parameters, T* residuals) const {
|
|
Packit |
ea1746 |
const T b1 = parameters[0];
|
|
Packit |
ea1746 |
const T b2 = parameters[1];
|
|
Packit |
ea1746 |
const T b3 = parameters[2];
|
|
Packit |
ea1746 |
const T b4 = parameters[3];
|
|
Packit |
ea1746 |
residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
|
|
Packit |
ea1746 |
return true;
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
private:
|
|
Packit |
ea1746 |
const double x_;
|
|
Packit |
ea1746 |
const double y_;
|
|
Packit |
ea1746 |
};
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
CostFunction* cost_function =
|
|
Packit |
ea1746 |
new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
|
|
Packit |
ea1746 |
new Rat43CostFunctor(x, y));
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Notice that compared to numeric differentiation, the only difference
|
|
Packit |
ea1746 |
when defining the functor for use with automatic differentiation is
|
|
Packit |
ea1746 |
the signature of the ``operator()``.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
In the case of numeric differentition it was
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
bool operator()(const double* parameters, double* residuals) const;
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
and for automatic differentiation it is a templated function of the
|
|
Packit |
ea1746 |
form
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template <typename T> bool operator()(const T* parameters, T* residuals) const;
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
So what does this small change buy us? The following table compares
|
|
Packit |
ea1746 |
the time it takes to evaluate the residual and the Jacobian for
|
|
Packit |
ea1746 |
`Rat43` using various methods.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
========================== =========
|
|
Packit |
ea1746 |
CostFunction Time (ns)
|
|
Packit |
ea1746 |
========================== =========
|
|
Packit |
ea1746 |
Rat43Analytic 255
|
|
Packit |
ea1746 |
Rat43AnalyticOptimized 92
|
|
Packit |
ea1746 |
Rat43NumericDiffForward 262
|
|
Packit |
ea1746 |
Rat43NumericDiffCentral 517
|
|
Packit |
ea1746 |
Rat43NumericDiffRidders 3760
|
|
Packit |
ea1746 |
Rat43AutomaticDiff 129
|
|
Packit |
ea1746 |
========================== =========
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
We can get exact derivatives using automatic differentiation
|
|
Packit |
ea1746 |
(``Rat43AutomaticDiff``) with about the same effort that is required
|
|
Packit |
ea1746 |
to write the code for numeric differentiation but only :math:`40\%`
|
|
Packit |
ea1746 |
slower than hand optimized analytical derivatives.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
So how does it work? For this we will have to learn about **Dual
|
|
Packit |
ea1746 |
Numbers** and **Jets** .
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Dual Numbers & Jets
|
|
Packit |
ea1746 |
===================
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. NOTE::
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Reading this and the next section on implementing Jets is not
|
|
Packit |
ea1746 |
necessary to use automatic differentiation in Ceres Solver. But
|
|
Packit |
ea1746 |
knowing the basics of how Jets work is useful when debugging and
|
|
Packit |
ea1746 |
reasoning about the performance of automatic differentiation.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Dual numbers are an extension of the real numbers analogous to complex
|
|
Packit |
ea1746 |
numbers: whereas complex numbers augment the reals by introducing an
|
|
Packit |
ea1746 |
imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
|
|
Packit |
ea1746 |
numbers introduce an *infinitesimal* unit :math:`\epsilon` such that
|
|
Packit |
ea1746 |
:math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
|
|
Packit |
ea1746 |
components, the *real* component :math:`a` and the *infinitesimal*
|
|
Packit |
ea1746 |
component :math:`v`.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Surprisingly, this simple change leads to a convenient method for
|
|
Packit |
ea1746 |
computing exact derivatives without needing to manipulate complicated
|
|
Packit |
ea1746 |
symbolic expressions.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
For example, consider the function
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
f(x) = x^2 ,
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Then,
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
\begin{align}
|
|
Packit |
ea1746 |
f(10 + \epsilon) &= (10 + \epsilon)^2\\
|
|
Packit |
ea1746 |
&= 100 + 20 \epsilon + \epsilon^2\\
|
|
Packit |
ea1746 |
&= 100 + 20 \epsilon
|
|
Packit |
ea1746 |
\end{align}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
|
|
Packit |
ea1746 |
20`. Indeed this generalizes to functions which are not
|
|
Packit |
ea1746 |
polynomial. Consider an arbitrary differentiable function
|
|
Packit |
ea1746 |
:math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
|
|
Packit |
ea1746 |
considering the Taylor expansion of :math:`f` near :math:`x`, which
|
|
Packit |
ea1746 |
gives us the infinite series
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
\begin{align}
|
|
Packit |
ea1746 |
f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
|
|
Packit |
ea1746 |
\frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
|
|
Packit |
ea1746 |
f(x + \epsilon) &= f(x) + Df(x) \epsilon
|
|
Packit |
ea1746 |
\end{align}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Here we are using the fact that :math:`\epsilon^2 = 0`.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
|
|
Packit |
ea1746 |
:math:`n`-dimensional dual number, where we augment the real numbers
|
|
Packit |
ea1746 |
with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
|
|
Packit |
ea1746 |
the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
|
|
Packit |
ea1746 |
a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional
|
|
Packit |
ea1746 |
*infinitesimal* part :math:`\mathbf{v}`, i.e.,
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
x = a + \sum_j v_{j} \epsilon_j
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
The summation notation gets tedious, so we will also just write
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
x = a + \mathbf{v}.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
where the :math:`\epsilon_i`'s are implict. Then, using the same
|
|
Packit |
ea1746 |
Taylor series expansion used above, we can see that:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Similarly for a multivariate function
|
|
Packit |
ea1746 |
:math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
|
|
Packit |
ea1746 |
:math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
|
|
Packit |
ea1746 |
standard basis vector, then, the above expression would simplify to
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
and we can extract the coordinates of the Jacobian by inspecting the
|
|
Packit |
ea1746 |
coefficients of :math:`\epsilon_i`.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Implementing Jets
|
|
Packit |
ea1746 |
-----------------
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
In order for the above to work in practice, we will need the ability
|
|
Packit |
ea1746 |
to evaluate an arbitrary function :math:`f` not just on real numbers
|
|
Packit |
ea1746 |
but also on dual numbers, but one does not usually evaluate functions
|
|
Packit |
ea1746 |
by evaluating their Taylor expansions,
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
This is where C++ templates and operator overloading comes into
|
|
Packit |
ea1746 |
play. The following code fragment has a simple implementation of a
|
|
Packit |
ea1746 |
``Jet`` and some operators/functions that operate on them.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template<int N> struct Jet {
|
|
Packit |
ea1746 |
double a;
|
|
Packit |
ea1746 |
Eigen::Matrix<double, 1, N> v;
|
|
Packit |
ea1746 |
};
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
|
|
Packit |
ea1746 |
return Jet<N>(f.a + g.a, f.v + g.v);
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
|
|
Packit |
ea1746 |
return Jet<N>(f.a - g.a, f.v - g.v);
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
|
|
Packit |
ea1746 |
return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
|
|
Packit |
ea1746 |
return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
template <int N> Jet<N> exp(const Jet<N>& f) {
|
|
Packit |
ea1746 |
return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
// This is a simple implementation for illustration purposes, the
|
|
Packit |
ea1746 |
// actual implementation of pow requires careful handling of a number
|
|
Packit |
ea1746 |
// of corner cases.
|
|
Packit |
ea1746 |
template <int N> Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
|
|
Packit |
ea1746 |
return Jet<N>(pow(f.a, g.a),
|
|
Packit |
ea1746 |
g.a * pow(f.a, g.a - 1.0) * f.v +
|
|
Packit |
ea1746 |
pow(f.a, g.a) * log(f.a); * g.v);
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
With these overloaded functions in hand, we can now call
|
|
Packit |
ea1746 |
``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
|
|
Packit |
ea1746 |
that together with appropriately initialized Jets allows us to compute
|
|
Packit |
ea1746 |
the Jacobian as follows:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
|
|
Packit |
ea1746 |
public:
|
|
Packit |
ea1746 |
Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
|
|
Packit |
ea1746 |
virtual ~Rat43Automatic() {}
|
|
Packit |
ea1746 |
virtual bool Evaluate(double const* const* parameters,
|
|
Packit |
ea1746 |
double* residuals,
|
|
Packit |
ea1746 |
double** jacobians) const {
|
|
Packit |
ea1746 |
// Just evaluate the residuals if Jacobians are not required.
|
|
Packit |
ea1746 |
if (!jacobians) return (*functor_)(parameters[0], residuals);
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
// Initialize the Jets
|
|
Packit |
ea1746 |
ceres::Jet<4> jets[4];
|
|
Packit |
ea1746 |
for (int i = 0; i < 4; ++i) {
|
|
Packit |
ea1746 |
jets[i].a = parameters[0][i];
|
|
Packit |
ea1746 |
jets[i].v.setZero();
|
|
Packit |
ea1746 |
jets[i].v[i] = 1.0;
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
ceres::Jet<4> result;
|
|
Packit |
ea1746 |
(*functor_)(jets, &result);
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
// Copy the values out of the Jet.
|
|
Packit |
ea1746 |
residuals[0] = result.a;
|
|
Packit |
ea1746 |
for (int i = 0; i < 4; ++i) {
|
|
Packit |
ea1746 |
jacobians[0][i] = result.v[i];
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
return true;
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
private:
|
|
Packit |
ea1746 |
std::unique_ptr<const Rat43CostFunctor> functor_;
|
|
Packit |
ea1746 |
};
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Indeed, this is essentially how :class:`AutoDiffCostFunction` works.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Pitfalls
|
|
Packit |
ea1746 |
========
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Automatic differentiation frees the user from the burden of computing
|
|
Packit |
ea1746 |
and reasoning about the symbolic expressions for the Jacobians, but
|
|
Packit |
ea1746 |
this freedom comes at a cost. For example consider the following
|
|
Packit |
ea1746 |
simple functor:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
struct Functor {
|
|
Packit |
ea1746 |
template <typename T> bool operator()(const T* x, T* residual) const {
|
|
Packit |
ea1746 |
residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
|
|
Packit |
ea1746 |
return true;
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
};
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Looking at the code for the residual computation, one does not foresee
|
|
Packit |
ea1746 |
any problems. However, if we look at the analytical expressions for
|
|
Packit |
ea1746 |
the Jacobian:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
y &= 1 - \sqrt{x_0^2 + x_1^2}\\
|
|
Packit |
ea1746 |
D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
|
|
Packit |
ea1746 |
D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
|
|
Packit |
ea1746 |
0`.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
There is no single solution to this problem. In some cases one needs
|
|
Packit |
ea1746 |
to reason explicitly about the points where indeterminacy may occur
|
|
Packit |
ea1746 |
and use alternate expressions using `L'Hopital's rule
|
|
Packit |
ea1746 |
<https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
|
|
Packit |
ea1746 |
example some of the conversion routines in `rotation.h
|
|
Packit |
ea1746 |
<https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
|
|
Packit |
ea1746 |
other cases, one may need to regularize the expressions to eliminate
|
|
Packit |
ea1746 |
these points.
|