File levenberg_marquardt_solver.h

namespace sym
template<typename ScalarType, typename _LinearSolverType = sym::SparseCholeskySolver<Eigen::SparseMatrix<ScalarType>>>
class LevenbergMarquardtSolver
#include <levenberg_marquardt_solver.h>

Fast Levenberg-Marquardt solver for nonlinear least squares problems specified by a linearization function. Supports on-manifold optimization and sparse solving, and attempts to minimize allocations after the first iteration.

This assumes the problem structure is the same for the lifetime of the object - if the problem structure changes, create a new LevenbergMarquardtSolver.

TODO(aaron): Analyze what repeated allocations we do have, and get rid of them if possible

Not thread safe! Create one per thread.

Example usage:

constexpr const int M = 9;
constexpr const int N = 5;

// Create a function that computes the residual (a linear residual for this example)
const auto J_MN = sym::RandomNormalMatrix<double, M, N>(gen);
const auto linearize_func = [&J_MN](const sym::Valuesd& values,
                                    sym::SparseLinearizationd* const linearization) {
  const auto state_vec = values.At<sym::Vector5d>('v');
  linearization->residual = J_MN * state_vec;
  linearization->hessian_lower = (J_MN.transpose() * J_MN).sparseView();
  linearization->jacobian = J_MN.sparseView();
  linearization->rhs = J_MN.transpose() * linearization->residual;
};

// Create a Values
sym::Valuesd values_init{};
values_init.Set('v', (StateVector::Ones() * 100).eval());

// Create a Solver
sym::LevenbergMarquardtSolverd solver(params, "", kEpsilon);
solver.SetIndex(values_init.CreateIndex({'v'}));
solver.Reset(values_init);

// Iterate to convergence
sym::optimization_stats_t stats;
bool should_early_exit = false;
while (!should_early_exit) {
  should_early_exit = solver.Iterate(linearize_func, &stats);
}

// Get the best values
sym::Valuesd values_final = solver.GetBestValues();

The theory:

We start with a nonlinear vector-valued error function that defines an error residual for which we want to minimize the squared norm. The residual is dimension M, the state is N.

residual = f(x)

Define a least squares cost function as the squared norm of the residual:

e(x) = 0.5 * ||f(x)||**2 = 0.5 * f(x).T * f(x)

Take the first order taylor expansion for x around the linearization point x0:

f(x) = f(x0) + f'(x0) * (x - x0) + ...

Plug in to the cost function to get a quadratic:

e(x) ~= 0.5 * (x - x0).T * f'(x0).T * f'(x0) * (x - x0) + f(x0).T * f'(x0) * (x - x0)
        + 0.5 * f(x0).T * f(x0)

Take derivative with respect to x:

e'(x) = f'(x0).T * f'(x0) * (x - x0) + f'(x0).T * f(x0)

Set to zero to find the minimum value of the quadratic (paraboloid):

0 = f'(x0).T * f'(x0) * (x - x0) + f'(x0).T * f(x0)
(x - x0) = - inv(f'(x0).T * f'(x0)) * f'(x0).T * f(x0)
x = x0 - inv(f'(x0).T * f'(x0)) * f'(x0).T * f(x0)

Another way to write this is to create some helpful shorthand:

f'(x0) --> jacobian or J (shape = MxN)
f (x0) --> bias or b     (shape = Mx1)
x - x0 --> dx            (shape = Nx1)

Rederiving the Gauss-Newton solution:

e(x) ~= 0.5 * dx.T * J.T * J * dx + b.T * J * dx + 0.5 * b.T * b
e'(x) = J.T * J * dx + J.T * b
x = x0 - inv(J.T * J) * J.T * b

A couple more common names:

f'(x0).T * f'(x0) = J.T * J --> hessian approximation or H (shape = NxN)
f'(x0).T * f (x0) = J.T * b --> right hand side or rhs     (shape = Nx1)

So the iteration loop for optimization is:

J, b = linearize(f, x0)
dx = -inv(J.T * J) * J.T * b
x_new = x0 + dx

Technically what we’ve just described is the Gauss-Newton algorithm; the Levenberg-Marquardt algorithm is based on Gauss-Newton, but adds a term to J.T * J before inverting to make sure the matrix is invertible and make the optimization more stable. This additional term typically takes the form lambda * I, or lambda * diag(J.T * J), where lambda is another parameter updated by the solver at each iteration. Configuration of how this term is computed can be found in the optimizer params.

Public Types

using Scalar = ScalarType
using LinearSolverType = _LinearSolverType
using MatrixType = typename LinearSolverType::MatrixType
using StateType = internal::LevenbergMarquardtState<MatrixType>
using LinearizationType = Linearization<MatrixType>
using FailureReason = levenberg_marquardt_solver_failure_reason_t
using LinearizeFunc = std::function<void(const Values<Scalar>&, LinearizationType&)>

Public Functions

inline LevenbergMarquardtSolver(const optimizer_params_t &p, const std::string &id, const Scalar epsilon)
inline LevenbergMarquardtSolver(const optimizer_params_t &p, const std::string &id, const Scalar epsilon, const LinearSolverType &linear_solver)
inline void SetIndex(const index_t &index)
inline void Reset(const Values<Scalar> &values)
inline void ResetState(const Values<Scalar> &values)
inline const optimizer_params_t &Params() const
void UpdateParams(const optimizer_params_t &p)
inline const LinearSolverType &LinearSolver() const
inline LinearSolverType &LinearSolver()
optional<std::pair<optimization_status_t, FailureReason>> Iterate(const LinearizeFunc &func, OptimizationStats<MatrixType> &stats)
inline const Values<Scalar> &GetBestValues() const
inline const LinearizationType &GetBestLinearization() const
void ComputeCovariance(const MatrixType &hessian_lower, MatrixX<Scalar> &covariance)

Private Functions

void DampHessian(MatrixType &hessian_lower, bool &have_max_diagonal, VectorX<Scalar> &max_diagonal, Scalar lambda, VectorX<Scalar> &damping_vector, VectorX<Scalar> &undamped_diagonal) const
void CheckHessianDiagonal(const MatrixType &hessian_lower_damped, Scalar lambda)
void PopulateIterationStats(optimization_iteration_t &iteration_stats, const StateType &state, Scalar new_error, Scalar new_error_linear, Scalar relative_reduction, Scalar gain_ratio) const
void Update(const Values<Scalar> &values, const index_t &index, const VectorX<Scalar> &update, Values<Scalar> &updated_values) const

Private Members

optimizer_params_t p_
std::string id_
Scalar epsilon_
StateType state_
LinearSolverType linear_solver_ = {}
bool solver_analyzed_ = {false}
bool have_max_diagonal_ = {false}
VectorX<Scalar> max_diagonal_
Scalar current_lambda_
Scalar current_nu_
bool have_last_update_ = {false}
VectorX<Scalar> last_update_
int iteration_ = {-1}
VectorX<Scalar> update_
VectorX<Scalar> damping_vector_
VectorX<Scalar> undamped_diagonal_
Eigen::Array<bool, Eigen::Dynamic, 1> zero_diagonal_
std::vector<int> zero_diagonal_indices_
index_t index_ = {}