Skip to content

Training Type Plugin environment related setting from Trainer #7007

@shuyingsunshine21

Description

@shuyingsunshine21

🚀 Feature

Motivation

When user provide a specified training type plugin, user has to pass in num_nodes and sync_batchnorm explicitly. For example DDPPlugin. These parameters are set from Trainer, probably it is better to reuse the setting from Trainer instead of specifying again.

Now we have to specify like:

trainer = Trainer(
    num_nodes = 2,
    gpus = 8, 
    sync_batchnorm = True,
    plugins = [
       DDPPlugin(num_nodes=2, sync_batchnorm=True,  .... # other parameters)
   ]

Ideally, we could

trainer = Trainer(
    num_nodes = 2,
    gpus = 8, 
    sync_batchnorm = True,
    plugins = [
       DDPPlugin(.... # other parameters)
   ]

relying on accelerator_connector to connect num_nodes and sync_batchnorm to training type plugin instance.

Pitch

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/connectors/accelerator_connector.py#L439-L445

        if hasattr(training_type, 'num_nodes') and getattr(training_type, 'num_nodes') is None:
            training_type.num_nodes = self.num_nodes

        # Automatically set sync_batchnorm if None.
        # Useful for custom plugins.
        if hasattr(training_type, 'sync_batchnorm') and getattr(training_type, 'sync_batchnorm') is None:
            training_type.sync_batchnorm = self.sync_batchnorm

here, instead of setting only when getattr(training_type, 'num_nodes') is None, we override as long as training_type has this attribute.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIs an improvement or enhancementhelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions