-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onloggingRelated to the `LoggerConnector` and `log()`Related to the `LoggerConnector` and `log()`
Description
π Bug
Logging to console prints some stuff twice, and does not output my custom logging. Verbose EarlyStopping does also not output to console:
|segmentation|base|py-3.8.5 Stanley in ~/Repos/segmentation
Β± |master U:1 ?:1 β| β python train.py
GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: False, using: 0 TPU cores
INFO:lightning:TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Using native 16bit precision.
INFO:lightning:Using native 16bit precision.
Missing logger folder: ./logs/11-11-2020-04-29-21_LR0_001_BS5_IS512
WARNING:lightning:Missing logger folder: ./logs/11-11-2020-04-29-21_LR0_001_BS5_IS512
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
INFO:lightning:initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
| Name | Type | Params | In sizes | Out sizes
-------------------------------------------------------------------------------------------------------------------
0 | criterion | BCEWithLogitsLoss | 0 | ? | ?
1 | in_conv | DoubleConvolution | 38 K | [5, 3, 512, 512] | [5, 64, 512, 512]
2 | down_conv_1 | Down | 221 K | [5, 64, 512, 512] | [5, 128, 256, 256]
3 | down_conv_2 | Down | 885 K | [5, 128, 256, 256] | [5, 256, 128, 128]
4 | down_conv_3 | Down | 3 M | [5, 256, 128, 128] | [5, 512, 64, 64]
5 | down_conv_4 | Down | 4 M | [5, 512, 64, 64] | [5, 512, 32, 32]
6 | up_conv_1 | Up | 5 M | [[5, 512, 32, 32], [5, 512, 64, 64]] | [5, 256, 64, 64]
7 | up_conv_2 | Up | 1 M | [[5, 256, 64, 64], [5, 256, 128, 128]] | [5, 128, 128, 128]
8 | up_conv_3 | Up | 369 K | [[5, 128, 128, 128], [5, 128, 256, 256]] | [5, 64, 256, 256]
9 | up_conv_4 | Up | 110 K | [[5, 64, 256, 256], [5, 64, 512, 512]] | [5, 64, 512, 512]
10 | out_conv | OutConvolution | 65 | [5, 64, 512, 512] | [5, 1, 512, 512]
INFO:lightning:
| Name | Type | Params | In sizes | Out sizes
-------------------------------------------------------------------------------------------------------------------
0 | criterion | BCEWithLogitsLoss | 0 | ? | ?
1 | in_conv | DoubleConvolution | 38 K | [5, 3, 512, 512] | [5, 64, 512, 512]
2 | down_conv_1 | Down | 221 K | [5, 64, 512, 512] | [5, 128, 256, 256]
3 | down_conv_2 | Down | 885 K | [5, 128, 256, 256] | [5, 256, 128, 128]
4 | down_conv_3 | Down | 3 M | [5, 256, 128, 128] | [5, 512, 64, 64]
5 | down_conv_4 | Down | 4 M | [5, 512, 64, 64] | [5, 512, 32, 32]
6 | up_conv_1 | Up | 5 M | [[5, 512, 32, 32], [5, 512, 64, 64]] | [5, 256, 64, 64]
7 | up_conv_2 | Up | 1 M | [[5, 256, 64, 64], [5, 256, 128, 128]] | [5, 128, 128, 128]
8 | up_conv_3 | Up | 369 K | [[5, 128, 128, 128], [5, 128, 256, 256]] | [5, 64, 256, 256]
9 | up_conv_4 | Up | 110 K | [[5, 64, 256, 256], [5, 64, 512, 512]] | [5, 64, 512, 512]
10 | out_conv | OutConvolution | 65 | [5, 64, 512, 512] | [5, 1, 512, 512]
Epoch 3: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 5173/7395 [33:13<14:16, 2.60it/s, loss=0.327, v_num=0]
Testing: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1752/1752 [43:15<00:00, 1.49s/it]--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_f1': tensor(0.9091, device='cuda:0'),
'test_loss': tensor(0.2796, device='cuda:0'),
'test_precision': tensor(0.9091, device='cuda:0'),
'test_recall': tensor(0.9091, device='cuda:0'),
'train_f1': tensor(0.9245, device='cuda:0'),
'train_loss': tensor(0.2836, device='cuda:0'),
'train_precision': tensor(0.9245, device='cuda:0'),
'train_recall': tensor(0.9245, device='cuda:0'),
'val_f1': tensor(0.9164, device='cuda:0'),
'val_loss': tensor(0.2818, device='cuda:0'),
'val_precision': tensor(0.9164, device='cuda:0'),
'val_recall': tensor(0.9164, device='cuda:0')}
--------------------------------------------------------------------------------
Testing: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1752/1752 [43:16<00:00, 1.48s/it]
|segmentation|base|py-3.8.5 Stanley in ~/Repos/segmentation
Β± |master U:1 ?:2 β| β
To Reproduce
Here is my training code:
import logging
import os
import sys
from argparse import ArgumentParser
from datetime import datetime
from knockknock import discord_sender
import torch
from dotenv import load_dotenv
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import EarlyStopping
from pytorch_lightning.loggers import TensorBoardLogger
from torch.backends import cudnn
from unet.unet_model import UNet
load_dotenv(verbose=True)
@discord_sender(webhook_url=os.getenv("DISCORD_WH"))
def main():
"""
Main training loop.
"""
parser = ArgumentParser()
parser = UNet.add_model_specific_args(parser)
parser = Trainer.add_argparse_args(parser)
args = parser.parse_args()
prod = bool(os.getenv("PROD"))
logging.getLogger("lightning").setLevel(logging.INFO)
if prod:
logging.info("Training i production mode, disabling all debugging APIs")
torch.autograd.set_detect_anomaly(False)
torch.autograd.profiler.profile(enabled=False)
torch.autograd.profiler.emit_nvtx(enabled=False)
else:
logging.info("Training i development mode, debugging APIs active.")
torch.autograd.set_detect_anomaly(True)
torch.autograd.profiler.profile(
enabled=True, use_cuda=True, record_shapes=True, profile_memory=True
)
torch.autograd.profiler.emit_nvtx(enabled=True, record_shapes=True)
model = UNet(**vars(args))
logging.info(
f"Network:\n"
f"\t{model.hparams.n_channels} input channels\n"
f"\t{model.hparams.n_classes} output channels (classes)\n"
f'\t{"Bilinear" if model.hparams.bilinear else "Transposed conv"} upscaling'
)
cudnn.benchmark = True # cudnn Autotuner
cudnn.enabled = True # look for optimal algorithms
early_stop_callback = EarlyStopping(
monitor="val_loss",
min_delta=0.00,
mode="min",
patience=3 if not os.getenv("EARLY_STOP") else int(os.getenv("EARLY_STOP")),
verbose=True,
)
run_name = "{}_LR{}_BS{}_IS{}".format(
datetime.now().strftime("%d-%m-%Y-%H-%M-%S"),
args.lr,
args.batch_size,
args.image_size,
).replace(".", "_")
log_folder = (
"./logs" if not os.getenv("DIR_ROOT_DIR") else os.getenv("DIR_ROOT_DIR")
)
if not os.path.isdir(log_folder):
os.mkdir(log_folder)
logger = TensorBoardLogger(log_folder, name=run_name)
try:
trainer = Trainer.from_argparse_args(
args,
gpus=-1,
precision=16,
distributed_backend="ddp",
logger=logger,
callbacks=[early_stop_callback],
accumulate_grad_batches=1.0
if not os.getenv("ACC_GRAD")
else int(os.getenv("ACC_GRAD")),
gradient_clip_val=0.0
if not os.getenv("GRAD_CLIP")
else float(os.getenv("GRAD_CLIP")),
max_epochs=100 if not os.getenv("EPOCHS") else int(os.getenv("EPOCHS")),
val_check_interval=0.1
if not os.getenv("VAL_INT_PER")
else float(os.getenv("VAL_INT_PER")),
default_root_dir=os.getcwd()
if not os.getenv("DIR_ROOT_DIR")
else os.getenv("DIR_ROOT_DIR"),
)
trainer.fit(model)
trainer.test(model)
except KeyboardInterrupt:
torch.save(model.state_dict(), "INTERRUPTED.pth")
logging.info("Saved interrupt")
try:
sys.exit(0)
except SystemExit:
os._exit(0)
if __name__ == "__main__":
main()Expected behavior
Environment
- CUDA:
- GPU:
- GeForce RTX 2070 SUPER
- available: True
- version: 11.0
- GPU:
- Packages:
- numpy: 1.19.4
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu110
- pytorch-lightning: 1.0.5
- tqdm: 4.51.0
- System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.5
- version: Training accuracyΒ #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020
theobdt, ykoneee, 18445864529 and TopCoder2K
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onloggingRelated to the `LoggerConnector` and `log()`Related to the `LoggerConnector` and `log()`