Tree - source-git/ceres-solver

source-git / ceres-solver

Blame docs/source/numerical_derivatives.rst

Blob History Raw

Packit	ea1746	`.. default-domain:: cpp`
Packit	ea1746
Packit	ea1746	`.. cpp:namespace:: ceres`
Packit	ea1746
Packit	ea1746	`.. _chapter-numerical_derivatives:`
Packit	ea1746
Packit	ea1746	`===================`
Packit	ea1746	`Numeric derivatives`
Packit	ea1746	`===================`
Packit	ea1746
Packit	ea1746	`The other extreme from using analytic derivatives is to use numeric`
Packit	ea1746	`derivatives. The key observation here is that the process of`
Packit	ea1746	differentiating a function :math:`f(x)` w.r.t :math:`x` can be written
Packit	ea1746	`as the limiting process:`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`Df(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}`
Packit	ea1746
Packit	ea1746
Packit	ea1746	`Forward Differences`
Packit	ea1746	`===================`
Packit	ea1746
Packit	ea1746	`Now of course one cannot perform the limiting operation numerically on`
Packit	ea1746	`a computer so we do the next best thing, which is to choose a small`
Packit	ea1746	value of :math:`h` and approximate the derivative as
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`Df(x) \approx \frac{f(x + h) - f(x)}{h}`
Packit	ea1746
Packit	ea1746
Packit	ea1746	`The above formula is the simplest most basic form of numeric`
Packit	ea1746	`differentiation. It is known as the Forward Difference formula.`
Packit	ea1746
Packit	ea1746	`So how would one go about constructing a numerically differentiated`
Packit	ea1746	version of ``Rat43Analytic`` (`Rat43
Packit	ea1746	<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) in
Packit	ea1746	`Ceres Solver. This is done in two steps:`
Packit	ea1746
Packit	ea1746	`1. Define Functor that given the parameter values will evaluate the`
Packit	ea1746	residual for a given :math:`(x,y)`.
Packit	ea1746	2. Construct a :class:`CostFunction` by using
Packit	ea1746	:class:`NumericDiffCostFunction` to wrap an instance of
Packit	ea1746	``Rat43CostFunctor``.
Packit	ea1746
Packit	ea1746	`.. code-block:: c++`
Packit	ea1746
Packit	ea1746	`struct Rat43CostFunctor {`
Packit	ea1746	`Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}`
Packit	ea1746
Packit	ea1746	`bool operator()(const double* parameters, double* residuals) const {`
Packit	ea1746	`const double b1 = parameters[0];`
Packit	ea1746	`const double b2 = parameters[1];`
Packit	ea1746	`const double b3 = parameters[2];`
Packit	ea1746	`const double b4 = parameters[3];`
Packit	ea1746	`residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;`
Packit	ea1746	`return true;`
Packit	ea1746	`}`
Packit	ea1746
Packit	ea1746	`const double x_;`
Packit	ea1746	`const double y_;`
Packit	ea1746	`}`
Packit	ea1746
Packit	ea1746	`CostFunction* cost_function =`
Packit	ea1746	`new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(`
Packit	ea1746	`new Rat43CostFunctor(x, y));`
Packit	ea1746
Packit	ea1746	`This is about the minimum amount of work one can expect to do to`
Packit	ea1746	`define the cost function. The only thing that the user needs to do is`
Packit	ea1746	`to make sure that the evaluation of the residual is implemented`
Packit	ea1746	`correctly and efficiently.`
Packit	ea1746
Packit	ea1746	`Before going further, it is instructive to get an estimate of the`
Packit	ea1746	`error in the forward difference formula. We do this by considering the`
Packit	ea1746	`Taylor expansion <https://en.wikipedia.org/wiki/Taylor_series>`_ of
Packit	ea1746	:math:`f` near :math:`x`.
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`\begin{align}`
Packit	ea1746	`f(x+h) &= f(x) + h Df(x) + \frac{h^2}{2!} D^2f(x) +`
Packit	ea1746	`\frac{h^3}{3!}D^3f(x) + \cdots \\`
Packit	ea1746	`Df(x) &= \frac{f(x + h) - f(x)}{h} - \left [\frac{h}{2!}D^2f(x) +`
Packit	ea1746	`\frac{h^2}{3!}D^3f(x) + \cdots \right]\\`
Packit	ea1746	`Df(x) &= \frac{f(x + h) - f(x)}{h} + O(h)`
Packit	ea1746	`\end{align}`
Packit	ea1746
Packit	ea1746	`i.e., the error in the forward difference formula is`
Packit	ea1746	:math:`O(h)` [#f4]_.
Packit	ea1746
Packit	ea1746
Packit	ea1746	`Implementation Details`
Packit	ea1746	`----------------------`
Packit	ea1746
Packit	ea1746	:class:`NumericDiffCostFunction` implements a generic algorithm to
Packit	ea1746	`numerically differentiate a given functor. While the actual`
Packit	ea1746	implementation of :class:`NumericDiffCostFunction` is complicated, the
Packit	ea1746	net result is a :class:`CostFunction` that roughly looks something
Packit	ea1746	`like the following:`
Packit	ea1746
Packit	ea1746	`.. code-block:: c++`
Packit	ea1746
Packit	ea1746	`class Rat43NumericDiffForward : public SizedCostFunction<1,4> {`
Packit	ea1746	`public:`
Packit	ea1746	`Rat43NumericDiffForward(const Rat43Functor* functor) : functor_(functor) {}`
Packit	ea1746	`virtual ~Rat43NumericDiffForward() {}`
Packit	ea1746	`virtual bool Evaluate(double const* const* parameters,`
Packit	ea1746	`double* residuals,`
Packit	ea1746	`double** jacobians) const {`
Packit	ea1746	`functor_(parameters[0], residuals);`
Packit	ea1746	`if (!jacobians) return true;`
Packit	ea1746	`double* jacobian = jacobians[0];`
Packit	ea1746	`if (!jacobian) return true;`
Packit	ea1746
Packit	ea1746	`const double f = residuals[0];`
Packit	ea1746	`double parameters_plus_h[4];`
Packit	ea1746	`for (int i = 0; i < 4; ++i) {`
Packit	ea1746	`std::copy(parameters, parameters + 4, parameters_plus_h);`
Packit	ea1746	`const double kRelativeStepSize = 1e-6;`
Packit	ea1746	`const double h = std::abs(parameters[i]) * kRelativeStepSize;`
Packit	ea1746	`parameters_plus_h[i] += h;`
Packit	ea1746	`double f_plus;`
Packit	ea1746	`functor_(parameters_plus_h, &f_plus);`
Packit	ea1746	`jacobian[i] = (f_plus - f) / h;`
Packit	ea1746	`}`
Packit	ea1746	`return true;`
Packit	ea1746	`}`
Packit	ea1746
Packit	ea1746	`private:`
Packit	ea1746	`scoped_ptr<Rat43Functor> functor_;`
Packit	ea1746	`};`
Packit	ea1746
Packit	ea1746
Packit	ea1746	Note the choice of step size :math:`h` in the above code, instead of
Packit	ea1746	`an absolute step size which is the same for all parameters, we use a`
Packit	ea1746	relative step size of :math:`\text{kRelativeStepSize} = 10^{-6}`. This
Packit	ea1746	`gives better derivative estimates than an absolute step size [#f2]_`
Packit	ea1746	`[#f3]_. This choice of step size only works for parameter values that`
Packit	ea1746	`are not close to zero. So the actual implementation of`
Packit	ea1746	:class:`NumericDiffCostFunction`, uses a more complex step size
Packit	ea1746	`selection logic, where close to zero, it switches to a fixed step`
Packit	ea1746	`size.`
Packit	ea1746
Packit	ea1746
Packit	ea1746	`Central Differences`
Packit	ea1746	`===================`
Packit	ea1746
Packit	ea1746	:math:`O(h)` error in the Forward Difference formula is okay but not
Packit	ea1746	`great. A better method is to use the Central Difference formula:`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`Df(x) \approx \frac{f(x + h) - f(x - h)}{2h}`
Packit	ea1746
Packit	ea1746	Notice that if the value of :math:`f(x)` is known, the Forward
Packit	ea1746	`Difference formula only requires one extra evaluation, but the Central`
Packit	ea1746	`Difference formula requires two evaluations, making it twice as`
Packit	ea1746	`expensive. So is the extra evaluation worth it?`
Packit	ea1746
Packit	ea1746	`To answer this question, we again compute the error of approximation`
Packit	ea1746	`in the central difference formula:`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`\begin{align}`
Packit	ea1746	`f(x + h) &= f(x) + h Df(x) + \frac{h^2}{2!}`
Packit	ea1746	`D^2f(x) + \frac{h^3}{3!} D^3f(x) + \frac{h^4}{4!} D^4f(x) + \cdots\\`
Packit	ea1746	`f(x - h) &= f(x) - h Df(x) + \frac{h^2}{2!}`
Packit	ea1746	`D^2f(x) - \frac{h^3}{3!} D^3f(c_2) + \frac{h^4}{4!} D^4f(x) +`
Packit	ea1746	`\cdots\\`
Packit	ea1746	`Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}`
Packit	ea1746	`D^3f(x) + \frac{h^4}{5!}`
Packit	ea1746	`D^5f(x) + \cdots \\`
Packit	ea1746	`Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + O(h^2)`
Packit	ea1746	`\end{align}`
Packit	ea1746
Packit	ea1746	The error of the Central Difference formula is :math:`O(h^2)`, i.e.,
Packit	ea1746	`the error goes down quadratically whereas the error in the Forward`
Packit	ea1746	`Difference formula only goes down linearly.`
Packit	ea1746
Packit	ea1746	`Using central differences instead of forward differences in Ceres`
Packit	ea1746	`Solver is a simple matter of changing a template argument to`
Packit	ea1746	:class:`NumericDiffCostFunction` as follows:
Packit	ea1746
Packit	ea1746	`.. code-block:: c++`
Packit	ea1746
Packit	ea1746	`CostFunction* cost_function =`
Packit	ea1746	`new NumericDiffCostFunction<Rat43CostFunctor, CENTRAL, 1, 4>(`
Packit	ea1746	`new Rat43CostFunctor(x, y));`
Packit	ea1746
Packit	ea1746	`But what do these differences in the error mean in practice? To see`
Packit	ea1746	`this, consider the problem of evaluating the derivative of the`
Packit	ea1746	`univariate function`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`f(x) = \frac{e^x}{\sin x - x^2},`
Packit	ea1746
Packit	ea1746	at :math:`x = 1.0`.
Packit	ea1746
Packit	ea1746	It is easy to determine that :math:`Df(1.0) =
Packit	ea1746	140.73773557129658`. Using this value as reference, we can now compute
Packit	ea1746	`the relative error in the forward and central difference formulae as a`
Packit	ea1746	`function of the absolute step size and plot them.`
Packit	ea1746
Packit	ea1746	`.. figure:: forward_central_error.png`
Packit	ea1746	`:figwidth: 100%`
Packit	ea1746	`:align: center`
Packit	ea1746
Packit	ea1746	`Reading the graph from right to left, a number of things stand out in`
Packit	ea1746	`the above graph:`
Packit	ea1746
Packit	ea1746	`1. The graph for both formulae have two distinct regions. At first,`
Packit	ea1746	starting from a large value of :math:`h` the error goes down as
Packit	ea1746	`the effect of truncating the Taylor series dominates, but as the`
Packit	ea1746	value of :math:`h` continues to decrease, the error starts
Packit	ea1746	`increasing again as roundoff error starts to dominate the`
Packit	ea1746	`computation. So we cannot just keep on reducing the value of`
Packit	ea1746	:math:`h` to get better estimates of :math:`Df`. The fact that we
Packit	ea1746	`are using finite precision arithmetic becomes a limiting factor.`
Packit	ea1746	`2. Forward Difference formula is not a great method for evaluating`
Packit	ea1746	`derivatives. Central Difference formula converges much more`
Packit	ea1746	`quickly to a more accurate estimate of the derivative with`
Packit	ea1746	decreasing step size. So unless the evaluation of :math:`f(x)` is
Packit	ea1746	`so expensive that you absolutely cannot afford the extra`
Packit	ea1746	`evaluation required by central differences, **do not use the`
Packit	ea1746	`Forward Difference formula**.`
Packit	ea1746	3. Neither formula works well for a poorly chosen value of :math:`h`.
Packit	ea1746
Packit	ea1746
Packit	ea1746	`Ridders' Method`
Packit	ea1746	`===============`
Packit	ea1746
Packit	ea1746	So, can we get better estimates of :math:`Df` without requiring such
Packit	ea1746	small values of :math:`h` that we start hitting floating point
Packit	ea1746	`roundoff errors?`
Packit	ea1746
Packit	ea1746	`One possible approach is to find a method whose error goes down faster`
Packit	ea1746	than :math:`O(h^2)`. This can be done by applying `Richardson
Packit	ea1746	`Extrapolation`
Packit	ea1746	<https://en.wikipedia.org/wiki/Richardson_extrapolation>`_ to the
Packit	ea1746	`problem of differentiation. This is also known as Ridders' Method`
Packit	ea1746	`[Ridders]_.`
Packit	ea1746
Packit	ea1746	`Let us recall, the error in the central differences formula.`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`\begin{align}`
Packit	ea1746	`Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}`
Packit	ea1746	`D^3f(x) + \frac{h^4}{5!}`
Packit	ea1746	`D^5f(x) + \cdots\\`
Packit	ea1746	`& = \frac{f(x + h) - f(x - h)}{2h} + K_2 h^2 + K_4 h^4 + \cdots`
Packit	ea1746	`\end{align}`
Packit	ea1746
Packit	ea1746	The key thing to note here is that the terms :math:`K_2, K_4, ...`
Packit	ea1746	are indepdendent of :math:`h` and only depend on :math:`x`.
Packit	ea1746
Packit	ea1746	`Let us now define:`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746
Packit	ea1746	`A(1, m) = \frac{f(x + h/2^{m-1}) - f(x - h/2^{m-1})}{2h/2^{m-1}}.`
Packit	ea1746
Packit	ea1746	`Then observe that`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746
Packit	ea1746	`Df(x) = A(1,1) + K_2 h^2 + K_4 h^4 + \cdots`
Packit	ea1746
Packit	ea1746	`and`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746
Packit	ea1746	`Df(x) = A(1, 2) + K_2 (h/2)^2 + K_4 (h/2)^4 + \cdots`
Packit	ea1746
Packit	ea1746	`Here we have halved the step size to obtain a second central`
Packit	ea1746	differences estimate of :math:`Df(x)`. Combining these two estimates,
Packit	ea1746	`we get:`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746
Packit	ea1746	`Df(x) = \frac{4 A(1, 2) - A(1,1)}{4 - 1} + O(h^4)`
Packit	ea1746
Packit	ea1746	which is an approximation of :math:`Df(x)` with truncation error that
Packit	ea1746	goes down as :math:`O(h^4)`. But we do not have to stop here. We can
Packit	ea1746	`iterate this process to obtain even more accurate estimates as`
Packit	ea1746	`follows:`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746
Packit	ea1746	`A(n, m) = \begin{cases}`
Packit	ea1746	`\frac{\displaystyle f(x + h/2^{m-1}) - f(x -`
Packit	ea1746	`h/2^{m-1})}{\displaystyle 2h/2^{m-1}} & n = 1 \\`
Packit	ea1746	`\frac{\displaystyle 4^{n-1} A(n - 1, m + 1) - A(n - 1, m)}{\displaystyle 4^{n-1} - 1} & n > 1`
Packit	ea1746	`\end{cases}`
Packit	ea1746
Packit	ea1746	`It is straightforward to show that the approximation error in`
Packit	ea1746	:math:`A(n, 1)` is :math:`O(h^{2n})`. To see how the above formula can
Packit	ea1746	be implemented in practice to compute :math:`A(n,1)` it is helpful to
Packit	ea1746	`structure the computation as the following tableau:`
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`\begin{array}{ccccc}`
Packit	ea1746	`A(1,1) & A(1, 2) & A(1, 3) & A(1, 4) & \cdots\\`
Packit	ea1746	`& A(2, 1) & A(2, 2) & A(2, 3) & \cdots\\`
Packit	ea1746	`& & A(3, 1) & A(3, 2) & \cdots\\`
Packit	ea1746	`& & & A(4, 1) & \cdots \\`
Packit	ea1746	`& & & & \ddots`
Packit	ea1746	`\end{array}`
Packit	ea1746
Packit	ea1746	So, to compute :math:`A(n, 1)` for increasing values of :math:`n` we
Packit	ea1746	`move from the left to the right, computing one column at a`
Packit	ea1746	`time. Assuming that the primary cost here is the evaluation of the`
Packit	ea1746	function :math:`f(x)`, the cost of computing a new column of the above
Packit	ea1746	`tableau is two function evaluations. Since the cost of evaluating`
Packit	ea1746	:math:`A(1, n)`, requires evaluating the central difference formula
Packit	ea1746	for step size of :math:`2^{1-n}h`
Packit	ea1746
Packit	ea1746	Applying this method to :math:`f(x) = \frac{e^x}{\sin x - x^2}`
Packit	ea1746	starting with a fairly large step size :math:`h = 0.01`, we get:
Packit	ea1746
Packit	ea1746	`.. math::`
Packit	ea1746	`\begin{array}{rrrrr}`
Packit	ea1746	`141.678097131 &140.971663667 &140.796145400 &140.752333523 &140.741384778\\`
Packit	ea1746	`&140.736185846 &140.737639311 &140.737729564 &140.737735196\\`
Packit	ea1746	`& &140.737736209 &140.737735581 &140.737735571\\`
Packit	ea1746	`& & &140.737735571 &140.737735571\\`
Packit	ea1746	`& & & &140.737735571\\`
Packit	ea1746	`\end{array}`
Packit	ea1746
Packit	ea1746	Compared to the correct value :math:`Df(1.0) = 140.73773557129658`,
Packit	ea1746	:math:`A(5, 1)` has a relative error of :math:`10^{-13}`. For
Packit	ea1746	`comparison, the relative error for the central difference formula with`
Packit	ea1746	the same stepsize (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
Packit	ea1746
Packit	ea1746	`The above tableau is the basis of Ridders' method for numeric`
Packit	ea1746	`differentiation. The full implementation is an adaptive scheme that`
Packit	ea1746	`tracks its own estimation error and stops automatically when the`
Packit	ea1746	`desired precision is reached. Of course it is more expensive than the`
Packit	ea1746	`forward and central difference formulae, but is also significantly`
Packit	ea1746	`more robust and accurate.`
Packit	ea1746
Packit	ea1746	`Using Ridder's method instead of forward or central differences in`
Packit	ea1746	`Ceres is again a simple matter of changing a template argument to`
Packit	ea1746	:class:`NumericDiffCostFunction` as follows:
Packit	ea1746
Packit	ea1746	`.. code-block:: c++`
Packit	ea1746
Packit	ea1746	`CostFunction* cost_function =`
Packit	ea1746	`new NumericDiffCostFunction<Rat43CostFunctor, RIDDERS, 1, 4>(`
Packit	ea1746	`new Rat43CostFunctor(x, y));`
Packit	ea1746
Packit	ea1746	`The following graph shows the relative error of the three methods as a`
Packit	ea1746	`function of the absolute step size. For Ridders's method we assume`
Packit	ea1746	that the step size for evaluating :math:`A(n,1)` is :math:`2^{1-n}h`.
Packit	ea1746
Packit	ea1746	`.. figure:: forward_central_ridders_error.png`
Packit	ea1746	`:figwidth: 100%`
Packit	ea1746	`:align: center`
Packit	ea1746
Packit	ea1746	`Using the 10 function evaluations that are needed to compute`
Packit	ea1746	:math:`A(5,1)` we are able to approximate :math:`Df(1.0)` about a 1000
Packit	ea1746	`times better than the best central differences estimate. To put these`
Packit	ea1746	`numbers in perspective, machine epsilon for double precision`
Packit	ea1746	arithmetic is :math:`\approx 2.22 \times 10^{-16}`.
Packit	ea1746
Packit	ea1746	Going back to ``Rat43``, let us also look at the runtime cost of the
Packit	ea1746	`various methods for computing numeric derivatives.`
Packit	ea1746
Packit	ea1746	`========================== =========`
Packit	ea1746	`CostFunction Time (ns)`
Packit	ea1746	`========================== =========`
Packit	ea1746	`Rat43Analytic 255`
Packit	ea1746	`Rat43AnalyticOptimized 92`
Packit	ea1746	`Rat43NumericDiffForward 262`
Packit	ea1746	`Rat43NumericDiffCentral 517`
Packit	ea1746	`Rat43NumericDiffRidders 3760`
Packit	ea1746	`========================== =========`
Packit	ea1746
Packit	ea1746	`As expected, Central Differences is about twice as expensive as`
Packit	ea1746	`Forward Differences and the remarkable accuracy improvements of`
Packit	ea1746	`Ridders' method cost an order of magnitude more runtime.`
Packit	ea1746
Packit	ea1746	`Recommendations`
Packit	ea1746	`===============`
Packit	ea1746
Packit	ea1746	`Numeric differentiation should be used when you cannot compute the`
Packit	ea1746	`derivatives either analytically or using automatic differention. This`
Packit	ea1746	`is usually the case when you are calling an external library or`
Packit	ea1746	`function whose analytic form you do not know or even if you do, you`
Packit	ea1746	`are not in a position to re-write it in a manner required to use`
Packit	ea1746	:ref:`chapter-automatic_derivatives`.
Packit	ea1746
Packit	ea1746
Packit	ea1746	`When using numeric differentiation, use at least Central Differences,`
Packit	ea1746	`and if execution time is not a concern or the objective function is`
Packit	ea1746	`such that determining a good static relative step size is hard,`
Packit	ea1746	`Ridders' method is recommended.`
Packit	ea1746
Packit	ea1746	`.. rubric:: Footnotes`
Packit	ea1746
Packit	ea1746	.. [#f2] `Numerical Differentiation
Packit	ea1746	<https://en.wikipedia.org/wiki/Numerical_differentiation#Practical_considerations_using_floating_point_arithmetic>`_
Packit	ea1746	`.. [#f3] [Press]_ Numerical Recipes, Section 5.7`
Packit	ea1746	.. [#f4] In asymptotic error analysis, an error of :math:`O(h^k)`
Packit	ea1746	`means that the absolute-value of the error is at most some`
Packit	ea1746	constant times :math:`h^k` when :math:`h` is close enough to
Packit	ea1746	:math:`0`.

source-git / ceres-solver

Source Code

Blame docs/source/numerical_derivatives.rst