torch_numopt implemented Algorithms¶

Gradient Descent with Line Search¶

class GradientDescentLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', **kwargs)[source]¶

Bases: LineSearchOptimizer

Parameters:

model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.

get_step_direction(d_p_list, h_list)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

Newton’s method with Line Search¶

class NewtonLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, damping: str = 'none', mu: float = 1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', solver: str = 'solve', batch_size: int | None = None, **kwargs)[source]¶

Bases: SecondOrderOptimizer

Heavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py

Parameters:

model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
damping (bool) – Whether to use the diagonal of the Hessian matrix instead of an identity matrix to adjust the Hessian matrix.
mu (float) – Initial value for the coefficient used when adding a diagonal matrix to the Hessian matrix.
mu_dec (float) – Factor with which to decrease the coefficient of the diagonal matrix if the previous iteration didn’t improve the model.
mu_max (float) – Factor with which to increase the coefficient of the diagonal matrix if the previous iteration improved the model.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
solver (str) – Method to use to invert the hessian.
batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.

get_step_direction(d_p_list, h_list)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

Gauss-Newton algorithm with Line Search¶

class GaussNewtonLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', solver: str = 'solve', batch_size: int | None = None, **kwargs)[source]¶

Bases: SecondOrderOptimizer

Heavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py

Parameters:

model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
solver (str) – Method to use to invert the hessian.
batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.

get_step_direction(d_p_list, h_list)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

Levenberg-Marquardt algorithm with Line Search¶

class LevenbergMarquardtLS(model: Module, lr_init: float = 1, lr_method: str | None = None, mu: float = 0.001, mu_dec: float = 0.1, mu_max: float = 10000000000.0, fletcher: bool = False, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', solver: str = 'solve', batch_size: int | None = None, **kwargs)[source]¶

Bases: SecondOrderOptimizer

Heavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py and the matlab implementation of ‘learnlm’ https://es.mathworks.com/help/deeplearning/ref/trainlm.html#d126e69092

Parameters:

model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
mu (float) – Initial value for the coefficient used when adding a diagonal matrix to the Hessian approximation.
mu_dec (float) – Factor with which to decrease the coefficient of the diagonal matrix if the previous iteration didn’t improve the model.
mu_max (float) – Factor with which to increase the coefficient of the diagonal matrix if the previous iteration improved the model.
use_diagonal (bool) – Whether to use the diagonal of the Hessian approximation instead of an identity matrix to adjust the Hessian matrix.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
solver (str) – Method to use to invert the hessian.
batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.

get_step_direction(d_p_list, h_list)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

update(loss)[source]¶

Function to update the internal parameters of the optimization procedure.

loss: float: Loss of the Neural Network with the new parameters.

Conjugate Gradient algorithm with Line Search¶

class ConjugateGradientLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', cg_method: str = 'PRP+', **kwargs)[source]¶

Bases: LineSearchOptimizer

Heavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf https://arxiv.org/abs/2201.08568

Parameters:

model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
cg_method (str) – Formula used to calculate the conjugate gradient, options are “FR”, “PR” and “PRP+”.

get_step_direction(d_p_list, h_list=None)[source]¶

get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

AdaHessian¶

class AdaHessian(model: Module, lr_init: float = 1, lr_method: str | None = None, beta1=0.9, beta2=0.999, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, k: float = 1, line_search_method: str = 'const', line_search_cond: str = 'armijo', **kwargs)[source]¶

Bases: SecondOrderOptimizer

Heavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py

Parameters:

model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.

get_step_direction(d_p_list, h_list)[source]¶

get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]¶

Obtains the step direction used to update the network.

Parameters:

d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

Quasi-Newton¶

(Not implemented)