Skip to content

Feasibility of multi-task training in lightning with dynamic model size #1502

@Ge0rges

Description

@Ge0rges

Questions and Help

Hello all. I am interested in using lightning for my research project. However I'm having trouble assessing the feasibility of my architecture in lightning due to some particularities.

The typical train loop that lightning abstracts looks like this:

for epoch in range(epochs):
      ...train code...

However my structure looks something more like this.

for task_number in range(number_of_tasks):
    dataloader = Dataloader(task=t)  # The datalaoder is task dependent. 

    if task_number == 0:
        for epoch in range(epochs):
             ...regular train code...

    else:
        for epoch in range(epochs):
            ...selective retraining...  # This uses pytorch hooks to only train certain nodes by setting grads to 0
            model = split(model)  # Logic that may add new nodes to the model (size change), also does training of newly added nodes
            if loss > loss_threshold:
                model = dynamic_expansion(model)  # More logic that will do a size change and training

As you can see there are some challenges that don't easily translate to lightning, first the concept of tasks, task dependent loaders (for example, first task is a subset of mnist, second task is a different subset), and more complex task dependent logic which may cause a model size change and require newly added nodes to be trained.

I'm interested in using lightning, but I'm having trouble seeing how this arch could fit.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions