torch_numopt implemented Algorithms¶
Gradient Descent with Line Search¶
- class GradientDescentLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', **kwargs)[source]¶
Bases:
LineSearchOptimizer- Parameters:
model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
Newton’s method with Line Search¶
- class NewtonLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, damping: str = 'none', mu: float = 1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', solver: str = 'solve', batch_size: int | None = None, **kwargs)[source]¶
Bases:
SecondOrderOptimizerHeavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py
- Parameters:
model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
damping (bool) – Whether to use the diagonal of the Hessian matrix instead of an identity matrix to adjust the Hessian matrix.
mu (float) – Initial value for the coefficient used when adding a diagonal matrix to the Hessian matrix.
mu_dec (float) – Factor with which to decrease the coefficient of the diagonal matrix if the previous iteration didn’t improve the model.
mu_max (float) – Factor with which to increase the coefficient of the diagonal matrix if the previous iteration improved the model.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
solver (str) – Method to use to invert the hessian.
batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.
Gauss-Newton algorithm with Line Search¶
- class GaussNewtonLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', solver: str = 'solve', batch_size: int | None = None, **kwargs)[source]¶
Bases:
SecondOrderOptimizerHeavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py
- Parameters:
model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
solver (str) – Method to use to invert the hessian.
batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.
Levenberg-Marquardt algorithm with Line Search¶
- class LevenbergMarquardtLS(model: Module, lr_init: float = 1, lr_method: str | None = None, mu: float = 0.001, mu_dec: float = 0.1, mu_max: float = 10000000000.0, fletcher: bool = False, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', solver: str = 'solve', batch_size: int | None = None, **kwargs)[source]¶
Bases:
SecondOrderOptimizerHeavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py and the matlab implementation of ‘learnlm’ https://es.mathworks.com/help/deeplearning/ref/trainlm.html#d126e69092
- Parameters:
model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
mu (float) – Initial value for the coefficient used when adding a diagonal matrix to the Hessian approximation.
mu_dec (float) – Factor with which to decrease the coefficient of the diagonal matrix if the previous iteration didn’t improve the model.
mu_max (float) – Factor with which to increase the coefficient of the diagonal matrix if the previous iteration improved the model.
use_diagonal (bool) – Whether to use the diagonal of the Hessian approximation instead of an identity matrix to adjust the Hessian matrix.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
solver (str) – Method to use to invert the hessian.
batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.
- get_step_direction(d_p_list, h_list)[source]¶
Obtains the step direction used to update the network.
- Parameters:
d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.
- Returns:
p – New search direction
- Return type:
list
Conjugate Gradient algorithm with Line Search¶
- class ConjugateGradientLS(model: Module, lr_init: float = 1, lr_method: str | None = None, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, line_search_method: str = 'backtrack', line_search_cond: str = 'armijo', cg_method: str = 'PRP+', **kwargs)[source]¶
Bases:
LineSearchOptimizerHeavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf https://arxiv.org/abs/2201.08568
- Parameters:
model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
cg_method (str) – Formula used to calculate the conjugate gradient, options are “FR”, “PR” and “PRP+”.
AdaHessian¶
- class AdaHessian(model: Module, lr_init: float = 1, lr_method: str | None = None, beta1=0.9, beta2=0.999, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, k: float = 1, line_search_method: str = 'const', line_search_cond: str = 'armijo', **kwargs)[source]¶
Bases:
SecondOrderOptimizerHeavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py
- Parameters:
model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.
c2 (float) – Coefficient used in the second condition for wolfe conditions.
tau (float) – Factor used to reduce the step size in each step of the backtracking line search.
line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.
line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.
Quasi-Newton¶
(Not implemented)