|
Packit |
ea1746 |
.. default-domain:: cpp
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. cpp:namespace:: ceres
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. _chapter-analytical_derivatives:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
====================
|
|
Packit |
ea1746 |
Analytic Derivatives
|
|
Packit |
ea1746 |
====================
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Consider the problem of fitting the following curve (`Rat43
|
|
Packit |
ea1746 |
<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
|
|
Packit |
ea1746 |
data:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
|
|
Packit |
ea1746 |
determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
|
|
Packit |
ea1746 |
fit this data.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Which can be stated as the problem of finding the
|
|
Packit |
ea1746 |
values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
|
|
Packit |
ea1746 |
minimize the following objective function [#f1]_:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
\begin{align}
|
|
Packit |
ea1746 |
E(b_1, b_2, b_3, b_4)
|
|
Packit |
ea1746 |
&= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
|
|
Packit |
ea1746 |
&= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
|
|
Packit |
ea1746 |
\end{align}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
To solve this problem using Ceres Solver, we need to define a
|
|
Packit |
ea1746 |
:class:`CostFunction` that computes the residual :math:`f` for a given
|
|
Packit |
ea1746 |
:math:`x` and :math:`y` and its derivatives with respect to
|
|
Packit |
ea1746 |
:math:`b_1, b_2, b_3` and :math:`b_4`.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
Using elementary differential calculus, we can see that:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
\begin{align}
|
|
Packit |
ea1746 |
D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
|
|
Packit |
ea1746 |
D_2 f(b_1, b_2, b_3, b_4; x,y) &=
|
|
Packit |
ea1746 |
\frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
|
|
Packit |
ea1746 |
D_3 f(b_1, b_2, b_3, b_4; x,y) &=
|
|
Packit |
ea1746 |
\frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
|
|
Packit |
ea1746 |
D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
|
|
Packit |
ea1746 |
\end{align}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
With these derivatives in hand, we can now implement the
|
|
Packit |
ea1746 |
:class:`CostFunction` as:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
class Rat43Analytic : public SizedCostFunction<1,4> {
|
|
Packit |
ea1746 |
public:
|
|
Packit |
ea1746 |
Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
|
|
Packit |
ea1746 |
virtual ~Rat43Analytic() {}
|
|
Packit |
ea1746 |
virtual bool Evaluate(double const* const* parameters,
|
|
Packit |
ea1746 |
double* residuals,
|
|
Packit |
ea1746 |
double** jacobians) const {
|
|
Packit |
ea1746 |
const double b1 = parameters[0][0];
|
|
Packit |
ea1746 |
const double b2 = parameters[0][1];
|
|
Packit |
ea1746 |
const double b3 = parameters[0][2];
|
|
Packit |
ea1746 |
const double b4 = parameters[0][3];
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
residuals[0] = b1 * pow(1 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
if (!jacobians) return true;
|
|
Packit |
ea1746 |
double* jacobian = jacobians[0];
|
|
Packit |
ea1746 |
if (!jacobian) return true;
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
|
|
Packit |
ea1746 |
jacobian[1] = -b1 * exp(b2 - b3 * x_) *
|
|
Packit |
ea1746 |
pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
|
|
Packit |
ea1746 |
jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
|
|
Packit |
ea1746 |
pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
|
|
Packit |
ea1746 |
jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
|
|
Packit |
ea1746 |
pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
|
|
Packit |
ea1746 |
return true;
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
private:
|
|
Packit |
ea1746 |
const double x_;
|
|
Packit |
ea1746 |
const double y_;
|
|
Packit |
ea1746 |
};
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
This is tedious code, hard to read and with a lot of
|
|
Packit |
ea1746 |
redundancy. So in practice we will cache some sub-expressions to
|
|
Packit |
ea1746 |
improve its efficiency, which would give us something like:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. code-block:: c++
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
|
|
Packit |
ea1746 |
public:
|
|
Packit |
ea1746 |
Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
|
|
Packit |
ea1746 |
virtual ~Rat43AnalyticOptimized() {}
|
|
Packit |
ea1746 |
virtual bool Evaluate(double const* const* parameters,
|
|
Packit |
ea1746 |
double* residuals,
|
|
Packit |
ea1746 |
double** jacobians) const {
|
|
Packit |
ea1746 |
const double b1 = parameters[0][0];
|
|
Packit |
ea1746 |
const double b2 = parameters[0][1];
|
|
Packit |
ea1746 |
const double b3 = parameters[0][2];
|
|
Packit |
ea1746 |
const double b4 = parameters[0][3];
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
const double t1 = exp(b2 - b3 * x_);
|
|
Packit |
ea1746 |
const double t2 = 1 + t1;
|
|
Packit |
ea1746 |
const double t3 = pow(t2, -1.0 / b4);
|
|
Packit |
ea1746 |
residuals[0] = b1 * t3 - y_;
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
if (!jacobians) return true;
|
|
Packit |
ea1746 |
double* jacobian = jacobians[0];
|
|
Packit |
ea1746 |
if (!jacobian) return true;
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
const double t4 = pow(t2, -1.0 / b4 - 1);
|
|
Packit |
ea1746 |
jacobian[0] = t3;
|
|
Packit |
ea1746 |
jacobian[1] = -b1 * t1 * t4 / b4;
|
|
Packit |
ea1746 |
jacobian[2] = -x_ * jacobian[1];
|
|
Packit |
ea1746 |
jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
|
|
Packit |
ea1746 |
return true;
|
|
Packit |
ea1746 |
}
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
private:
|
|
Packit |
ea1746 |
const double x_;
|
|
Packit |
ea1746 |
const double y_;
|
|
Packit |
ea1746 |
};
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
What is the difference in performance of these two implementations?
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
========================== =========
|
|
Packit |
ea1746 |
CostFunction Time (ns)
|
|
Packit |
ea1746 |
========================== =========
|
|
Packit |
ea1746 |
Rat43Analytic 255
|
|
Packit |
ea1746 |
Rat43AnalyticOptimized 92
|
|
Packit |
ea1746 |
========================== =========
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
|
|
Packit |
ea1746 |
``Rat43Analytic``. This difference in run-time is not uncommon. To
|
|
Packit |
ea1746 |
get the best performance out of analytically computed derivatives, one
|
|
Packit |
ea1746 |
usually needs to optimize the code to account for common
|
|
Packit |
ea1746 |
sub-expressions.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
When should you use analytical derivatives?
|
|
Packit |
ea1746 |
===========================================
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
#. The expressions are simple, e.g. mostly linear.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
#. A computer algebra system like `Maple
|
|
Packit |
ea1746 |
<https://www.maplesoft.com/products/maple/>`_ , `Mathematica
|
|
Packit |
ea1746 |
<https://www.wolfram.com/mathematica/>`_, or `SymPy
|
|
Packit |
ea1746 |
<http://www.sympy.org/en/index.html>`_ can be used to symbolically
|
|
Packit |
ea1746 |
differentiate the objective function and generate the C++ to
|
|
Packit |
ea1746 |
evaluate them.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
#. Performance is of utmost concern and there is algebraic structure
|
|
Packit |
ea1746 |
in the terms that you can exploit to get better performance than
|
|
Packit |
ea1746 |
automatic differentiation.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
That said, getting the best performance out of analytical
|
|
Packit |
ea1746 |
derivatives requires a non-trivial amount of work. Before going
|
|
Packit |
ea1746 |
down this path, it is useful to measure the amount of time being
|
|
Packit |
ea1746 |
spent evaluating the Jacobian as a fraction of the total solve time
|
|
Packit |
ea1746 |
and remember `Amdahl's Law
|
|
Packit |
ea1746 |
<https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
#. There is no other way to compute the derivatives, e.g. you
|
|
Packit |
ea1746 |
wish to compute the derivative of the root of a polynomial:
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. math::
|
|
Packit |
ea1746 |
a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
with respect to :math:`x` and :math:`y`. This requires the use of
|
|
Packit |
ea1746 |
the `Inverse Function Theorem
|
|
Packit |
ea1746 |
<https://en.wikipedia.org/wiki/Inverse_function_theorem>`_
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
#. You love the chain rule and actually enjoy doing all the algebra by
|
|
Packit |
ea1746 |
hand.
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. rubric:: Footnotes
|
|
Packit |
ea1746 |
|
|
Packit |
ea1746 |
.. [#f1] The notion of best fit depends on the choice of the objective
|
|
Packit |
ea1746 |
function used to measure the quality of fit, which in turn
|
|
Packit |
ea1746 |
depends on the underlying noise process which generated the
|
|
Packit |
ea1746 |
observations. Minimizing the sum of squared differences is
|
|
Packit |
ea1746 |
the right thing to do when the noise is `Gaussian
|
|
Packit |
ea1746 |
<https://en.wikipedia.org/wiki/Normal_distribution>`_. In
|
|
Packit |
ea1746 |
that case the optimal value of the parameters is the `Maximum
|
|
Packit |
ea1746 |
Likelihood Estimate
|
|
Packit |
ea1746 |
<https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.
|