Skip to content

Mixed precision not working with v 1.2 #6077

@harpone

Description

@harpone

🐛 Bug

After updating to 1.2 from 1.1.1, automatic mixed precision stopped working. Everything's float32 and getting CUDA OOM when I shouldn't get it (with float16 tensors). Worked fine on 1.1.1.

Here's my Trainer args (maybe there's a conflicting combo of args or something):

Trainer(logger=logger,
        callbacks=[checkpoint_callback, lr_monitor],
        default_root_dir=None,
        gradient_clip_val=args.gradient_clip_val,
        gpus=args.gpus,
        auto_select_gpus=False,
        log_gpu_memory=None,
        progress_bar_refresh_rate=1,
        check_val_every_n_epoch=args.check_val_every_n_epoch,
        overfit_batches=0.,
        fast_dev_run=False,
        accumulate_grad_batches=1,
        max_epochs=args.max_epochs,
        limit_train_batches=vars(args).get('limit_train_batches', 1.),
        val_check_interval=args.val_check_interval,
        limit_val_batches=args.limit_val_batches,
        accelerator='ddp',
        sync_batchnorm=True,
        precision=args.precision,
        weights_summary='top',
        weights_save_path=None,
        num_sanity_val_steps=args.num_sanity_val_steps,
        resume_from_checkpoint=None,
        benchmark=False,
        deterministic=False,
        reload_dataloaders_every_epoch=True,
        terminate_on_nan=False,
        prepare_data_per_node=True,
        amp_backend='native',
        profiler=args.profiler)

Environment

  • GCP VM with V100 GPU(s)
  • NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0

Don't really have the time to go deeper than this, but just rolled back to 1.1.1 and everything's fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked onpriority: 0High priority task

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions