torch_numopt base classes

Custom Optimizer class

class CustomOptimizer(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], defaults: dict[str, Any])[source]

Bases: Optimizer, ABC

Class for Optimization methods using second derivative information.

abstract step(x: Tensor, y: Tensor, loss_fn: Module, closure: callable | None = None)[source]

Method to update the parameters of the Neural Network.

Parameters:
  • x (torch.Tensor) – Inputs of the Neural Network.

  • y (torch.Tensor) – Targets of the Neural Network.

  • loss_fn (nn.Module) – Loss function to be optimized.

  • closure (callable) – Kept for compatibility, unused.

update(loss: float)[source]

Function to update the internal parameters of the optimization procedure.

loss: float

Loss of the Neural Network with the new parameters.

Line Search Optimizer class

class LineSearchOptimizer(model: Module, lr_init: float = 1, lr_method: str | None = None, line_search_cond: str = 'armijo', line_search_method: str = 'const', c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1)[source]

Bases: CustomOptimizer, ABC

Base class for gradient-based optimization algorithms with line search.

Parameters:
  • model (nn.Module)

  • lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.

  • lr_method (str) – Method to use to initialize the learning rate before applying line search.

  • line_search_cond (str (optional))

  • line_search_method (str (optional))

  • c1 (float (optional))

  • c2 (float (optional))

  • tau (float (optional))

accept_step(params: list, new_params: list, step_dir: list, lr: float, loss: Tensor, new_loss: Tensor, grad: list)[source]

Compute one of the stopping conditions for line search methods.

Parameters:
  • params (list)

  • new_params (list)

  • step_dir (list)

  • lr (float)

  • loss (torch.Tensor)

  • new_loss (torch.Tensor)

  • grad (list)

Returns:

accepted

Return type:

bool

backtrack(params: list, step_dir: list, grad: list, lr_init: float, eval_model: callable)[source]

Perform backtracking line search.

Parameters:
  • params (list)

  • step_dir (list)

  • grad (list)

  • lr_init (float)

  • eval_model (callable)

Returns:

new_params

Return type:

list

interpolate_cubic(params: list, step_dir: list, grad: list, lr_init: float, eval_model: callable)[source]
Parameters:
  • params (list)

  • step_dir (list)

  • grad (list)

  • lr_init (float)

  • eval_model (callable)

bisect(params, step_dir, lr_init, eval_model, iter_max=1000, tol=1e-05)[source]
initialize_lr(lr: float, grad: list, step_dir: list, eval_model: callable, params: list)[source]
Parameters:
  • lr (float)

  • grad (list)

  • step_dir (list)

  • eval_model (callable)

  • params (list)

apply_gradients(eval_model: callable, params: list, d_p_list: list, h_list: list | None = None)[source]

Updates the parameters of the network using a direction and a step length.

Parameters:
  • lr (float)

  • eval_model (callable)

  • params (list)

  • d_p_list (list)

  • h_list (list, optional)

abstract get_step_direction(d_p_list: list, h_list: list)[source]

Obtains the step direction used to update the network.

Parameters:
  • d_p_list (list) – List of gradients of the parameters.

  • h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

get_scaling_matrix(x: Tensor, y: Tensor, loss_fn: Module)[source]

Obtains the step direction used to update the network.

Parameters:
  • d_p_list (list) – List of gradients of the parameters.

  • h_list (list) – List of Hessians of the parameters.

Returns:

p – New search direction

Return type:

list

step(x: Tensor, y: Tensor, loss_fn: Module)[source]

Method to update the parameters of the Neural Network.

Parameters:
  • x (torch.Tensor) – Inputs of the Neural Network.

  • y (torch.Tensor) – Targets of the Neural Network.

  • loss_fn (nn.Module) – Loss function to be optimized.

Second Order Optimizer class

class SecondOrderOptimizer(model: nn.Module, lr_init: float, lr_method: str = None, line_search_cond='armijo', line_search_method='const', c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, batch_size: int = None)[source]

Bases: LineSearchOptimizer, ABC

Class for Optimization methods using second derivative information.

Parameters:
  • model (nn.Module) – The model to be optimized

  • lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.

  • lr_method (str) – Method to use to initialize the learning rate before applying line search.

  • batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.

exact_hessian(x, y, loss_fn, vectorize=True)[source]

Calculation of the exact hessian of the Neural network given a dataset.

Parameters:
  • x (torch.Tensor) – Input dataset for calculting the loss.

  • y (torch.Tensor) – Target dataset for calculting the loss.

  • loss_fn (torch.Module) – Loss function for which to calculate the hessian.

  • vectorize (boolean) – Use vectorization in pytorch’s implementation of the hessian calculation.

approx_hessian_gn(x, y, loss_fn, vectorize=True)[source]

Calculation of the an approximate hessian of the Neural network given a dataset as in the Gauss-Newton algorithm. The approximate Hessian is calculated as the square of the Jacobian of the residual of every data point with respect to the parameters.

Let the loss function be, for example the MSE:

\(\mathcal{L}(x,y;\theta) = \sum^{N}_{i=1} (f(x_i; \theta) - y_i)^2 = \sum^{N}_{i=1} r_i\)

Then the Jacobian of the residuals will be the matrix:

\((J_{\theta}[\mathcal{L}])_{i,j} = \dfrac{\partial r_i}{\partial \theta_j}\)

Then, we will approximate the hessian as the product of the Jacobian with it’s transpose, noting that the result will be a square matrix with size \(p\times p\) with \(p\) being the number of parameters of the model:

\(H_{\theta}[\mathcal{L}] \approx J_{\theta}[\mathcal{L}]^{\intercal} \cdot J_{\theta}[\mathcal{L}]\)

Parameters:
  • x (torch.Tensor) – Input dataset for calculting the loss.

  • y (torch.Tensor) – Target dataset for calculting the loss.

  • loss_fn (torch.Module) – Loss function for which to calculate the hessian.

  • vectorize (boolean) – Use vectorization in pytorch’s implementation of the hessian calculation.

hutchinson_diagonal(x, y, loss_fn, vectorize=True, n_samples=1, as_matrix=False)[source]