Skip to content

Implement loss momentum for optimisation #4

@tmrob2

Description

@tmrob2

Presently, only naive SGD has been implemented. However, momentum is an important aspect for SGD - informally, it says the velocity of updates in a particular direction will continue based on recent history.

The parameter update at each time step will be a weighted average of the parameter updates at past time steps, with the weights decayed exponentially. The higher the decay, the more the weight update at each time step will be based on the parameter's accumulated momentum as opposed to its current velocity.

Implementation
Momentum parameter should by defined by mu and the gradient of each step is nabla_t. The update rule is then nabla[t] + mu * nabla_[t-1] + mu^2 * nabla[t-2] + ....

This can be computed like so:

Iteration Quantity
t1 f(t1) = nabla_t1
t2 f(t2) = nabla_t + mu * f(t1)
t3 f(t3) = nable_t + mu * f(t2)
... ...

Changes

Define a new OptimiserType : MomentumSGD.
Define a new Optimiser parameter - momentum

The class Optimiser will require a new private member velocities of type std::vector<RowMatrixXf> for each parameter in the neural network.

// initialise some param counter: called param here
// loop over all of the parameters in the neural network
Eigen::Ref<RowMatrixXf> paramMatrix = net->getLayers()[i] -> operations[j];
RowMatrixXf velNew(paramMatrix.rows(), paramMatrix.cols())
velNew.setZero();
velocities[param++] = velNew;

During the update the momentum parameter will need to be evaluated using

// First update the velocity by the momentum 
velocities[param] *= momentum;
velocities[param] += lr * ...paramGrad.array();
...param.array() -= velocities[param];
param++;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions