Skip to content

Transfer learning phases #2006

@lgvaz

Description

@lgvaz

🚀 Feature

When doing transfer learning we need to switch between phases.

Normally, the first phase is to freeze all but the head of the model and train only that.

After a predefined amount of epochs, we unfreeze the rest of our model (or a part of it) and start training again (possibly with the help of differential learning rates, described in #2005). We can repeat this phase as many times as we like.

We should implement a class that handles all of that for us, this includes:

  • Unfreeze part of our model
  • Reset and change the lr_scheduler parameters between phases
  • If LearningRateLogger is being used, register the new lr_scheduler

#2005 Will take care of the parameter groups
This will take care of what I call "phase switches"

Proposals

There are some ways of achieving this:

Logic inside on_epoch_start

def on_epoch_start(self):
    if self.current_epoch == 0:
        self.freeze()
        self.trainer.lr_schedulers = ... # Define new scheduler
        
    if self.current_epoch == N_FREEZE_EPOCHS:
        self.unfreeze() # Or partially unfreeze
        self.trainer.lr_schedulers = ... # Define new scheduler

We can keep adding as many milestones as we want this way, but it's important to note that they all have to be define beforehand.

Multiple calls to Trainer.fit

model.freeze()
trainer.fit_one_cycle(model, n_epochs=2, lr=1e-3, pct_start=0.9)
model.unfreeze()
trainer.fit_one_cycle(mode, n_epochs=5, lr=slice(5e-6, 5e-4), pct_start=0.2)

This is exactly the flow on fastai, this way of training model is excellent for iterative training, like on a notebook or a REPL.

fit_one_cycle assumes that we are using the OneCycleLR scheduler, assumes that each call is a continuation of the last, and assumes we want to reset our schedule

When we pass a slice to lr we are asking for a interpolation of values between the trainable layer groups

Implement a new scheduler (suggested by @williamFalcon)

The scheduler receives a list of dicts, each dict will specify the duration of the phase and it's configuration (what layers to freeze, what lrs to use, ...)

scheduler = FineTuneScheduler([
   {'params': [nn.Sequential(self.c_d1, self.c_d1_bn), self.c_d2], 'action': 'freeze', 'epoch': 0},
   {'params': [self.c_d2], 'action': 'unfreeze', 'epoch': 2},
])

Then we can just pass the scheduler to the Trainer.

Notes

In both cases, the flow should be the same for all standard areas (vision, nlp, time-series,...).

The only things we assume is:

  • You want to train on model in multiple phases
  • The phases are a continuation of each other

Metadata

Metadata

Assignees

No one assigned

    Labels

    designIncludes a design discussiondiscussionIn a discussion stagefeatureIs an improvement or enhancementhelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions