torch_numopt implemented Algorithms

AdaHessian

class AdaHessian(model: Module, lr_init: float = 1, lr_method: str | None = None, beta1=0.9, beta2=0.999, c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, k: float = 1, line_search_method: str = 'const', line_search_cond: str = 'armijo', **kwargs)[source]

Bases: SecondOrderOptimizer

Heavily inspired by https://github.com/hahnec/torchimize/blob/master/torchimize/optimizer/gna_opt.py

Parameters:
  • model (nn.Module) – The model to be optimized

  • lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.

  • lr_method (str) – Method to use to initialize the learning rate before applying line search.

  • c1 (float) – Coefficient of the sufficient increase condition in backtracking line search.

  • c2 (float) – Coefficient used in the second condition for wolfe conditions.

  • tau (float) – Factor used to reduce the step size in each step of the backtracking line search.

  • line_search_method (str) – Method used for line search, options are “backtrack” and “constant”.

  • line_search_cond (str) – Condition to be used in backtracking line search, options are “armijo”, “wolfe”, “strong-wolfe” and “goldstein”.

get_step_direction(d_p_list, h_list)[source]
get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]

Obtains the step direction used to update the network.

Parameters:
  • d_p_list (list) – List of gradients of the parameters.

  • h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

Quasi-Newton

(Not implemented)