-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 0High priority taskHigh priority task
Milestone
Description
🐛 Bug
After updating to 1.2 from 1.1.1, automatic mixed precision stopped working. Everything's float32 and getting CUDA OOM when I shouldn't get it (with float16 tensors). Worked fine on 1.1.1.
Here's my Trainer args (maybe there's a conflicting combo of args or something):
Trainer(logger=logger,
callbacks=[checkpoint_callback, lr_monitor],
default_root_dir=None,
gradient_clip_val=args.gradient_clip_val,
gpus=args.gpus,
auto_select_gpus=False,
log_gpu_memory=None,
progress_bar_refresh_rate=1,
check_val_every_n_epoch=args.check_val_every_n_epoch,
overfit_batches=0.,
fast_dev_run=False,
accumulate_grad_batches=1,
max_epochs=args.max_epochs,
limit_train_batches=vars(args).get('limit_train_batches', 1.),
val_check_interval=args.val_check_interval,
limit_val_batches=args.limit_val_batches,
accelerator='ddp',
sync_batchnorm=True,
precision=args.precision,
weights_summary='top',
weights_save_path=None,
num_sanity_val_steps=args.num_sanity_val_steps,
resume_from_checkpoint=None,
benchmark=False,
deterministic=False,
reload_dataloaders_every_epoch=True,
terminate_on_nan=False,
prepare_data_per_node=True,
amp_backend='native',
profiler=args.profiler)
Environment
- GCP VM with V100 GPU(s)
- NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0
Don't really have the time to go deeper than this, but just rolled back to 1.1.1 and everything's fine.
awaelchli
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 0High priority taskHigh priority task