Blame docs/html/_sources/analytical_derivatives.txt

Packit ea1746
.. default-domain:: cpp
Packit ea1746
Packit ea1746
.. cpp:namespace:: ceres
Packit ea1746
Packit ea1746
.. _chapter-analytical_derivatives:
Packit ea1746
Packit ea1746
====================
Packit ea1746
Analytic Derivatives
Packit ea1746
====================
Packit ea1746
Packit ea1746
Consider the problem of fitting the following curve (`Rat43
Packit ea1746
<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
Packit ea1746
data:
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
  y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}
Packit ea1746
Packit ea1746
That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
Packit ea1746
determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
Packit ea1746
fit this data.
Packit ea1746
Packit ea1746
Which can be stated as the problem of finding the
Packit ea1746
values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
Packit ea1746
minimize the following objective function [#f1]_:
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
   \begin{align}
Packit ea1746
   E(b_1, b_2, b_3, b_4)
Packit ea1746
   &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
Packit ea1746
   &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
Packit ea1746
   \end{align}
Packit ea1746
Packit ea1746
To solve this problem using Ceres Solver, we need to define a
Packit ea1746
:class:`CostFunction` that computes the residual :math:`f` for a given
Packit ea1746
:math:`x` and :math:`y` and its derivatives with respect to
Packit ea1746
:math:`b_1, b_2, b_3` and :math:`b_4`.
Packit ea1746
Packit ea1746
Using elementary differential calculus, we can see that:
Packit ea1746
Packit ea1746
.. math::
Packit ea1746
  \begin{align}
Packit ea1746
  D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
Packit ea1746
  D_2 f(b_1, b_2, b_3, b_4; x,y) &=
Packit ea1746
  \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
Packit ea1746
  D_3 f(b_1, b_2, b_3, b_4; x,y) &=
Packit ea1746
  \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
Packit ea1746
  D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1  \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
Packit ea1746
  \end{align}
Packit ea1746
Packit ea1746
With these derivatives in hand, we can now implement the
Packit ea1746
:class:`CostFunction` as:
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
  class Rat43Analytic : public SizedCostFunction<1,4> {
Packit ea1746
     public:
Packit ea1746
       Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
Packit ea1746
       virtual ~Rat43Analytic() {}
Packit ea1746
       virtual bool Evaluate(double const* const* parameters,
Packit ea1746
                             double* residuals,
Packit ea1746
                             double** jacobians) const {
Packit ea1746
         const double b1 = parameters[0][0];
Packit ea1746
         const double b2 = parameters[0][1];
Packit ea1746
         const double b3 = parameters[0][2];
Packit ea1746
         const double b4 = parameters[0][3];
Packit ea1746
Packit ea1746
         residuals[0] = b1 *  pow(1 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
Packit ea1746
Packit ea1746
         if (!jacobians) return true;
Packit ea1746
         double* jacobian = jacobians[0];
Packit ea1746
         if (!jacobian) return true;
Packit ea1746
Packit ea1746
         jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
Packit ea1746
         jacobian[1] = -b1 * exp(b2 - b3 * x_) *
Packit ea1746
                       pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
Packit ea1746
         jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
Packit ea1746
                       pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
Packit ea1746
         jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
Packit ea1746
                       pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
Packit ea1746
         return true;
Packit ea1746
       }
Packit ea1746
Packit ea1746
      private:
Packit ea1746
       const double x_;
Packit ea1746
       const double y_;
Packit ea1746
   };
Packit ea1746
Packit ea1746
This is tedious code, hard to read and with a lot of
Packit ea1746
redundancy. So in practice we will cache some sub-expressions to
Packit ea1746
improve its efficiency, which would give us something like:
Packit ea1746
Packit ea1746
.. code-block:: c++
Packit ea1746
Packit ea1746
  class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
Packit ea1746
     public:
Packit ea1746
       Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
Packit ea1746
       virtual ~Rat43AnalyticOptimized() {}
Packit ea1746
       virtual bool Evaluate(double const* const* parameters,
Packit ea1746
                             double* residuals,
Packit ea1746
                             double** jacobians) const {
Packit ea1746
         const double b1 = parameters[0][0];
Packit ea1746
         const double b2 = parameters[0][1];
Packit ea1746
         const double b3 = parameters[0][2];
Packit ea1746
         const double b4 = parameters[0][3];
Packit ea1746
Packit ea1746
         const double t1 = exp(b2 -  b3 * x_);
Packit ea1746
         const double t2 = 1 + t1;
Packit ea1746
         const double t3 = pow(t2, -1.0 / b4);
Packit ea1746
         residuals[0] = b1 * t3 - y_;
Packit ea1746
Packit ea1746
         if (!jacobians) return true;
Packit ea1746
         double* jacobian = jacobians[0];
Packit ea1746
         if (!jacobian) return true;
Packit ea1746
Packit ea1746
         const double t4 = pow(t2, -1.0 / b4 - 1);
Packit ea1746
         jacobian[0] = t3;
Packit ea1746
         jacobian[1] = -b1 * t1 * t4 / b4;
Packit ea1746
         jacobian[2] = -x_ * jacobian[1];
Packit ea1746
         jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
Packit ea1746
         return true;
Packit ea1746
       }
Packit ea1746
Packit ea1746
     private:
Packit ea1746
       const double x_;
Packit ea1746
       const double y_;
Packit ea1746
   };
Packit ea1746
Packit ea1746
What is the difference in performance of these two implementations?
Packit ea1746
Packit ea1746
==========================   =========
Packit ea1746
CostFunction                 Time (ns)
Packit ea1746
==========================   =========
Packit ea1746
Rat43Analytic                      255
Packit ea1746
Rat43AnalyticOptimized              92
Packit ea1746
==========================   =========
Packit ea1746
Packit ea1746
``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
Packit ea1746
``Rat43Analytic``.  This difference in run-time is not uncommon. To
Packit ea1746
get the best performance out of analytically computed derivatives, one
Packit ea1746
usually needs to optimize the code to account for common
Packit ea1746
sub-expressions.
Packit ea1746
Packit ea1746
Packit ea1746
When should you use analytical derivatives?
Packit ea1746
===========================================
Packit ea1746
Packit ea1746
#. The expressions are simple, e.g. mostly linear.
Packit ea1746
Packit ea1746
#. A computer algebra system like `Maple
Packit ea1746
   <https://www.maplesoft.com/products/maple/>`_ , `Mathematica
Packit ea1746
   <https://www.wolfram.com/mathematica/>`_, or `SymPy
Packit ea1746
   <http://www.sympy.org/en/index.html>`_ can be used to symbolically
Packit ea1746
   differentiate the objective function and generate the C++ to
Packit ea1746
   evaluate them.
Packit ea1746
Packit ea1746
#. Performance is of utmost concern and there is algebraic structure
Packit ea1746
   in the terms that you can exploit to get better performance than
Packit ea1746
   automatic differentiation.
Packit ea1746
Packit ea1746
   That said, getting the best performance out of analytical
Packit ea1746
   derivatives requires a non-trivial amount of work.  Before going
Packit ea1746
   down this path, it is useful to measure the amount of time being
Packit ea1746
   spent evaluating the Jacobian as a fraction of the total solve time
Packit ea1746
   and remember `Amdahl's Law
Packit ea1746
   <https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.
Packit ea1746
Packit ea1746
#. There is no other way to compute the derivatives, e.g. you
Packit ea1746
   wish to compute the derivative of the root of a polynomial:
Packit ea1746
Packit ea1746
   .. math::
Packit ea1746
     a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0
Packit ea1746
Packit ea1746
Packit ea1746
   with respect to :math:`x` and :math:`y`. This requires the use of
Packit ea1746
   the `Inverse Function Theorem
Packit ea1746
   <https://en.wikipedia.org/wiki/Inverse_function_theorem>`_
Packit ea1746
Packit ea1746
#. You love the chain rule and actually enjoy doing all the algebra by
Packit ea1746
   hand.
Packit ea1746
Packit ea1746
Packit ea1746
.. rubric:: Footnotes
Packit ea1746
Packit ea1746
.. [#f1] The notion of best fit depends on the choice of the objective
Packit ea1746
         function used to measure the quality of fit, which in turn
Packit ea1746
         depends on the underlying noise process which generated the
Packit ea1746
         observations. Minimizing the sum of squared differences is
Packit ea1746
         the right thing to do when the noise is `Gaussian
Packit ea1746
         <https://en.wikipedia.org/wiki/Normal_distribution>`_. In
Packit ea1746
         that case the optimal value of the parameters is the `Maximum
Packit ea1746
         Likelihood Estimate
Packit ea1746
         <https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.