-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
Milestone
Description
🐛 Bug
Exception: The wandb backend process has shutdown
full error info:
Traceback (most recent call last):
File "/home/cat/PycharmProjects/torch-ocr/tools/train/det_train/train.py", line 59, in <module>
trainer.fit(model, data)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in fit
self._call_and_handle_interrupt(
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1193, in _run
self._dispatch()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1272, in _dispatch
self.training_type_plugin.start_training(self)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1282, in run_stage
return self._run_train()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1312, in _run_train
self.fit_loop.run()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 232, in advance
self.trainer.logger_connector.update_train_step_metrics()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 225, in update_train_step_metrics
self.log_metrics(self.metrics["log"])
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 121, in log_metrics
self.trainer.logger.save()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 427, in save
logger.save()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 317, in save
self._finalize_agg_metrics()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 152, in _finalize_agg_metrics
self.log_metrics(metrics=metrics_to_log, step=agg_step)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 49, in wrapped_fn
return fn(*args, **kwargs)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 370, in log_metrics
self.experiment.log({**metrics, "trainer/global_step": step})
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 43, in experiment
return get_experiment() or DummyExperiment()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 49, in wrapped_fn
return fn(*args, **kwargs)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 41, in get_experiment
return fn(self)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 349, in experiment
self._experiment.define_metric("trainer/global_step")
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 2195, in define_metric
m._commit()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/wandb_metric.py", line 117, in _commit
self._callback(m)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 933, in _metric_callback
self._backend.interface._publish_metric(metric_record)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 309, in _publish_metric
self._publish(rec)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 223, in _publish
raise Exception("The wandb backend process has shutdown")
Exception: The wandb backend process has shutdown
wandb: Waiting for W&B process to finish, PID 7468... (failed 1). Press ctrl-c to abort syncing.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1671, in _atexit_cleanup
self._on_finish()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1844, in _on_finish
self._backend.interface._publish_telemetry(self._telemetry_obj)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 82, in _publish_telemetry
self._publish(rec)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 223, in _publish
raise Exception("The wandb backend process has shutdown")
Exception: The wandb backend process has shutdown
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1680, in _atexit_cleanup
self._backend.cleanup()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 228, in cleanup
self.interface.join()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 481, in join
super(InterfaceQueue, self).join()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 591, in join
self._communicate_shutdown()
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 478, in _communicate_shutdown
_ = self._communicate(record)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 232, in _communicate
return self._communicate_async(rec, local=local).get(timeout=timeout)
File "/home/cat/miniconda3/envs/torch-ocr/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 237, in _communicate_async
raise Exception("The wandb backend process has shutdown")
Exception: The wandb backend process has shutdown
进程已结束,退出代码为 1
To Reproduce
Expected behavior
Environment
- PyTorch Lightning Version (e.g., 1.5.0): 1.5.1
- PyTorch Version (e.g., 1.10): 1.8
- Python version (e.g., 3.9): 3.8
- OS (e.g., Linux): ubuntu20
- CUDA/cuDNN version: cuda11.2
- GPU models and configuration:
- How you installed PyTorch (
conda,pip, source): conda - If compiling from source, the output of
torch.__config__.show(): - Any other relevant information: