Blame docs/source/automatic_derivatives.rst

Packit ea1746
.. default-domain:: cpp
Packit ea1746
Packit ea1746
.. cpp:namespace:: ceres
Packit ea1746
Packit ea1746
.. _chapter-automatic_derivatives:
Packit ea1746
Packit ea1746
=====================
Packit ea1746
Automatic Derivatives
Packit ea1746
=====================
Packit ea1746
Packit ea1746
We will now consider automatic differentiation. It is a technique that
Packit ea1746
can compute exact derivatives, fast, while requiring about the same
Packit ea1746
effort from the user as is needed to use numerical differentiation.
Packit ea1746
Packit ea1746
Don't believe me? Well here goes. The following code fragment
Packit ea1746
implements an automatically differentiated ``CostFunction`` for `Rat43
Packit ea1746
<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
  struct Rat43CostFunctor {
Packit ea1746
    Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
Packit ea1746
Packit ea1746
    template <typename T>
Packit ea1746
    bool operator()(const T* parameters, T* residuals) const {
Packit ea1746
      const T b1 = parameters[0];
Packit ea1746
      const T b2 = parameters[1];
Packit ea1746
      const T b3 = parameters[2];
Packit ea1746
      const T b4 = parameters[3];
Packit ea1746
      residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
Packit ea1746
      return true;
Packit ea1746
    }
Packit ea1746
Packit ea1746
    private:
Packit ea1746
      const double x_;
Packit ea1746
      const double y_;
Packit ea1746
  };
Packit ea1746
Packit ea1746
Packit ea1746
  CostFunction* cost_function =
Packit ea1746
        new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
Packit ea1746
          new Rat43CostFunctor(x, y));
Packit ea1746
Packit ea1746
Notice that compared to numeric differentiation, the only difference
Packit ea1746
when defining the functor for use with automatic differentiation is
Packit ea1746
the signature of the ``operator()``.
Packit ea1746
Packit ea1746
In the case of numeric differentition it was
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
   bool operator()(const double* parameters, double* residuals) const;
Packit ea1746
Packit ea1746
and for automatic differentiation it is a templated function of the
Packit ea1746
form
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
   template <typename T> bool operator()(const T* parameters, T* residuals) const;
Packit ea1746
Packit ea1746
Packit ea1746
So what does this small change buy us? The following table compares
Packit ea1746
the time it takes to evaluate the residual and the Jacobian for
Packit ea1746
`Rat43` using various methods.
Packit ea1746
Packit ea1746
==========================   =========
Packit ea1746
CostFunction                 Time (ns)
Packit ea1746
==========================   =========
Packit ea1746
Rat43Analytic                      255
Packit ea1746
Rat43AnalyticOptimized              92
Packit ea1746
Rat43NumericDiffForward            262
Packit ea1746
Rat43NumericDiffCentral            517
Packit ea1746
Rat43NumericDiffRidders           3760
Packit ea1746
Rat43AutomaticDiff                 129
Packit ea1746
==========================   =========
Packit ea1746
Packit ea1746
We can get exact derivatives using automatic differentiation
Packit ea1746
(``Rat43AutomaticDiff``) with about the same effort that is required
Packit ea1746
to write the code for numeric differentiation but only :math:`40\%`
Packit ea1746
slower than hand optimized analytical derivatives.
Packit ea1746
Packit ea1746
So how does it work? For this we will have to learn about **Dual
Packit ea1746
Numbers** and **Jets** .
Packit ea1746
Packit ea1746
Packit ea1746
Dual Numbers & Jets
Packit ea1746
===================
Packit ea1746
Packit ea1746
.. NOTE::
Packit ea1746
Packit ea1746
   Reading this and the next section on implementing Jets is not
Packit ea1746
   necessary to use automatic differentiation in Ceres Solver. But
Packit ea1746
   knowing the basics of how Jets work is useful when debugging and
Packit ea1746
   reasoning about the performance of automatic differentiation.
Packit ea1746
Packit ea1746
Dual numbers are an extension of the real numbers analogous to complex
Packit ea1746
numbers: whereas complex numbers augment the reals by introducing an
Packit ea1746
imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
Packit ea1746
numbers introduce an *infinitesimal* unit :math:`\epsilon` such that
Packit ea1746
:math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
Packit ea1746
components, the *real* component :math:`a` and the *infinitesimal*
Packit ea1746
component :math:`v`.
Packit ea1746
Packit ea1746
Surprisingly, this simple change leads to a convenient method for
Packit ea1746
computing exact derivatives without needing to manipulate complicated
Packit ea1746
symbolic expressions.
Packit ea1746
Packit ea1746
For example, consider the function
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
Packit ea1746
   f(x) = x^2 ,
Packit ea1746
Packit ea1746
Then,
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
Packit ea1746
   \begin{align}
Packit ea1746
   f(10 + \epsilon) &= (10 + \epsilon)^2\\
Packit ea1746
            &= 100 + 20 \epsilon + \epsilon^2\\
Packit ea1746
            &= 100 + 20 \epsilon
Packit ea1746
   \end{align}
Packit ea1746
Packit ea1746
Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
Packit ea1746
20`. Indeed this generalizes to functions which are not
Packit ea1746
polynomial. Consider an arbitrary differentiable function
Packit ea1746
:math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
Packit ea1746
considering the Taylor expansion of :math:`f` near :math:`x`, which
Packit ea1746
gives us the infinite series
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
   \begin{align}
Packit ea1746
   f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
Packit ea1746
   \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
Packit ea1746
   f(x + \epsilon) &= f(x) + Df(x) \epsilon
Packit ea1746
   \end{align}
Packit ea1746
Packit ea1746
Here we are using the fact that :math:`\epsilon^2 = 0`.
Packit ea1746
Packit ea1746
A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
Packit ea1746
:math:`n`-dimensional dual number, where we augment the real numbers
Packit ea1746
with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
Packit ea1746
the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
Packit ea1746
a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional
Packit ea1746
*infinitesimal* part :math:`\mathbf{v}`, i.e.,
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
   x = a + \sum_j v_{j} \epsilon_j
Packit ea1746
Packit ea1746
The summation notation gets tedious, so we will also just write
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
   x = a + \mathbf{v}.
Packit ea1746
Packit ea1746
where the :math:`\epsilon_i`'s are implict. Then, using the same
Packit ea1746
Taylor series expansion used above, we can see that:
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
Packit ea1746
  f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.
Packit ea1746
Packit ea1746
Similarly for a multivariate function
Packit ea1746
:math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
Packit ea1746
:math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
   f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i
Packit ea1746
Packit ea1746
So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
Packit ea1746
standard basis vector, then, the above expression would simplify to
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
   f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i
Packit ea1746
Packit ea1746
and we can extract the coordinates of the Jacobian by inspecting the
Packit ea1746
coefficients of :math:`\epsilon_i`.
Packit ea1746
Packit ea1746
Implementing Jets
Packit ea1746
-----------------
Packit ea1746
Packit ea1746
In order for the above to work in practice, we will need the ability
Packit ea1746
to evaluate an arbitrary function :math:`f` not just on real numbers
Packit ea1746
but also on dual numbers, but one does not usually evaluate functions
Packit ea1746
by evaluating their Taylor expansions,
Packit ea1746
Packit ea1746
This is where C++ templates and operator overloading comes into
Packit ea1746
play. The following code fragment has a simple implementation of a
Packit ea1746
``Jet`` and some operators/functions that operate on them.
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
   template<int N> struct Jet {
Packit ea1746
     double a;
Packit ea1746
     Eigen::Matrix<double, 1, N> v;
Packit ea1746
   };
Packit ea1746
Packit ea1746
   template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
Packit ea1746
     return Jet<N>(f.a + g.a, f.v + g.v);
Packit ea1746
   }
Packit ea1746
Packit ea1746
   template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
Packit ea1746
     return Jet<N>(f.a - g.a, f.v - g.v);
Packit ea1746
   }
Packit ea1746
Packit ea1746
   template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
Packit ea1746
     return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
Packit ea1746
   }
Packit ea1746
Packit ea1746
   template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
Packit ea1746
     return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
Packit ea1746
   }
Packit ea1746
Packit ea1746
   template <int N> Jet<N> exp(const Jet<N>& f) {
Packit ea1746
     return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
Packit ea1746
   }
Packit ea1746
Packit ea1746
   // This is a simple implementation for illustration purposes, the
Packit ea1746
   // actual implementation of pow requires careful handling of a number
Packit ea1746
   // of corner cases.
Packit ea1746
   template <int N>  Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
Packit ea1746
     return Jet<N>(pow(f.a, g.a),
Packit ea1746
                   g.a * pow(f.a, g.a - 1.0) * f.v +
Packit ea1746
                   pow(f.a, g.a) * log(f.a); * g.v);
Packit ea1746
   }
Packit ea1746
Packit ea1746
Packit ea1746
With these overloaded functions in hand, we can now call
Packit ea1746
``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
Packit ea1746
that together with appropriately initialized Jets allows us to compute
Packit ea1746
the Jacobian as follows:
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
  class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
Packit ea1746
   public:
Packit ea1746
    Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
Packit ea1746
    virtual ~Rat43Automatic() {}
Packit ea1746
    virtual bool Evaluate(double const* const* parameters,
Packit ea1746
                          double* residuals,
Packit ea1746
                          double** jacobians) const {
Packit ea1746
      // Just evaluate the residuals if Jacobians are not required.
Packit ea1746
      if (!jacobians) return (*functor_)(parameters[0], residuals);
Packit ea1746
Packit ea1746
      // Initialize the Jets
Packit ea1746
      ceres::Jet<4> jets[4];
Packit ea1746
      for (int i = 0; i < 4; ++i) {
Packit ea1746
        jets[i].a = parameters[0][i];
Packit ea1746
        jets[i].v.setZero();
Packit ea1746
        jets[i].v[i] = 1.0;
Packit ea1746
      }
Packit ea1746
Packit ea1746
      ceres::Jet<4> result;
Packit ea1746
      (*functor_)(jets, &result);
Packit ea1746
Packit ea1746
      // Copy the values out of the Jet.
Packit ea1746
      residuals[0] = result.a;
Packit ea1746
      for (int i = 0; i < 4; ++i) {
Packit ea1746
        jacobians[0][i] = result.v[i];
Packit ea1746
      }
Packit ea1746
      return true;
Packit ea1746
    }
Packit ea1746
Packit ea1746
   private:
Packit ea1746
    std::unique_ptr<const Rat43CostFunctor> functor_;
Packit ea1746
  };
Packit ea1746
Packit ea1746
Indeed, this is essentially how :class:`AutoDiffCostFunction` works.
Packit ea1746
Packit ea1746
Packit ea1746
Pitfalls
Packit ea1746
========
Packit ea1746
Packit ea1746
Automatic differentiation frees the user from the burden of computing
Packit ea1746
and reasoning about the symbolic expressions for the Jacobians, but
Packit ea1746
this freedom comes at a cost. For example consider the following
Packit ea1746
simple functor:
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
   struct Functor {
Packit ea1746
     template <typename T> bool operator()(const T* x, T* residual) const {
Packit ea1746
       residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
Packit ea1746
       return true;
Packit ea1746
     }
Packit ea1746
   };
Packit ea1746
Packit ea1746
Looking at the code for the residual computation, one does not foresee
Packit ea1746
any problems. However, if we look at the analytical expressions for
Packit ea1746
the Jacobian:
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
Packit ea1746
      y &= 1 - \sqrt{x_0^2 + x_1^2}\\
Packit ea1746
   D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
Packit ea1746
   D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}
Packit ea1746
Packit ea1746
we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
Packit ea1746
0`.
Packit ea1746
Packit ea1746
There is no single solution to this problem. In some cases one needs
Packit ea1746
to reason explicitly about the points where indeterminacy may occur
Packit ea1746
and use alternate expressions using `L'Hopital's rule
Packit ea1746
<https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
Packit ea1746
example some of the conversion routines in `rotation.h
Packit ea1746
<https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
Packit ea1746
other cases, one may need to regularize the expressions to eliminate
Packit ea1746
these points.