torch_numopt base classes¶
Custom Optimizer class¶
- class CustomOptimizer(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], defaults: dict[str, Any])[source]¶
Bases:
Optimizer,ABCClass for Optimization methods using second derivative information.
- abstract step(x: Tensor, y: Tensor, loss_fn: Module, closure: callable | None = None)[source]¶
Method to update the parameters of the Neural Network.
- Parameters:
x (torch.Tensor) – Inputs of the Neural Network.
y (torch.Tensor) – Targets of the Neural Network.
loss_fn (nn.Module) – Loss function to be optimized.
closure (callable) – Kept for compatibility, unused.
Line Search Optimizer class¶
- class LineSearchOptimizer(model: Module, lr_init: float = 1, lr_method: str | None = None, line_search_cond: str = 'armijo', line_search_method: str = 'const', c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1)[source]¶
Bases:
CustomOptimizer,ABCBase class for gradient-based optimization algorithms with line search.
- Parameters:
model (nn.Module)
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
line_search_cond (str (optional))
line_search_method (str (optional))
c1 (float (optional))
c2 (float (optional))
tau (float (optional))
- accept_step(params: list, new_params: list, step_dir: list, lr: float, loss: Tensor, new_loss: Tensor, grad: list)[source]¶
Compute one of the stopping conditions for line search methods.
- Parameters:
params (list)
new_params (list)
step_dir (list)
lr (float)
loss (torch.Tensor)
new_loss (torch.Tensor)
grad (list)
- Returns:
accepted
- Return type:
bool
- backtrack(params: list, step_dir: list, grad: list, lr_init: float, eval_model: callable)[source]¶
Perform backtracking line search.
- Parameters:
params (list)
step_dir (list)
grad (list)
lr_init (float)
eval_model (callable)
- Returns:
new_params
- Return type:
list
- interpolate_cubic(params: list, step_dir: list, grad: list, lr_init: float, eval_model: callable)[source]¶
- Parameters:
params (list)
step_dir (list)
grad (list)
lr_init (float)
eval_model (callable)
- initialize_lr(lr: float, grad: list, step_dir: list, eval_model: callable, params: list)[source]¶
- Parameters:
lr (float)
grad (list)
step_dir (list)
eval_model (callable)
params (list)
- apply_gradients(eval_model: callable, params: list, d_p_list: list, h_list: list | None = None)[source]¶
Updates the parameters of the network using a direction and a step length.
- Parameters:
lr (float)
eval_model (callable)
params (list)
d_p_list (list)
h_list (list, optional)
- abstract get_step_direction(d_p_list: list, h_list: list)[source]¶
Obtains the step direction used to update the network.
- Parameters:
d_p_list (list) – List of gradients of the parameters.
h_list (list) – List of Hessians of the parameters.
- Returns:
p – New search direction
- Return type:
list
Second Order Optimizer class¶
- class SecondOrderOptimizer(model: nn.Module, lr_init: float, lr_method: str = None, line_search_cond='armijo', line_search_method='const', c1: float = 0.0001, c2: float = 0.9, tau: float = 0.1, batch_size: int = None)[source]¶
Bases:
LineSearchOptimizer,ABCClass for Optimization methods using second derivative information.
- Parameters:
model (nn.Module) – The model to be optimized
lr_init (float) – Maximum learning rate in backtracking line search, if the learning rate is set as constant, this will be the value used.
lr_method (str) – Method to use to initialize the learning rate before applying line search.
batch_size (int) – Size of the amount of data to use at a time to calculate the hessian matrix.
- exact_hessian(x, y, loss_fn, vectorize=True)[source]¶
Calculation of the exact hessian of the Neural network given a dataset.
- Parameters:
x (torch.Tensor) – Input dataset for calculting the loss.
y (torch.Tensor) – Target dataset for calculting the loss.
loss_fn (torch.Module) – Loss function for which to calculate the hessian.
vectorize (boolean) – Use vectorization in pytorch’s implementation of the hessian calculation.
- approx_hessian_gn(x, y, loss_fn, vectorize=True)[source]¶
Calculation of the an approximate hessian of the Neural network given a dataset as in the Gauss-Newton algorithm. The approximate Hessian is calculated as the square of the Jacobian of the residual of every data point with respect to the parameters.
Let the loss function be, for example the MSE:
\(\mathcal{L}(x,y;\theta) = \sum^{N}_{i=1} (f(x_i; \theta) - y_i)^2 = \sum^{N}_{i=1} r_i\)
Then the Jacobian of the residuals will be the matrix:
\((J_{\theta}[\mathcal{L}])_{i,j} = \dfrac{\partial r_i}{\partial \theta_j}\)
Then, we will approximate the hessian as the product of the Jacobian with it’s transpose, noting that the result will be a square matrix with size \(p\times p\) with \(p\) being the number of parameters of the model:
\(H_{\theta}[\mathcal{L}] \approx J_{\theta}[\mathcal{L}]^{\intercal} \cdot J_{\theta}[\mathcal{L}]\)
- Parameters:
x (torch.Tensor) – Input dataset for calculting the loss.
y (torch.Tensor) – Target dataset for calculting the loss.
loss_fn (torch.Module) – Loss function for which to calculate the hessian.
vectorize (boolean) – Use vectorization in pytorch’s implementation of the hessian calculation.