Skip to content

[Bug] Official Ray tune Pytorch Lightning example compatibility issue #21874

@JiaxuanYou

Description

@JiaxuanYou

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Tune

What happened + What you expected to happen

Running official Ray tune Pytorch Lightning example does not work with the latest Pytorch Lightning (1.5.8). When I downgrade Pytorch Lightning to 1.4.5, everything works.

Error log:

Traceback (most recent call last):
  File "pbt_pl.py", line 302, in <module>
    tune_mnist_pbt(num_samples=4, num_epochs=10, gpus_per_trial=0)
  File "pbt_pl.py", line 262, in tune_mnist_pbt
    analysis = tune.run(
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/tune.py", line 597, in run
    runner.step()
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 739, in step
    self._process_events(timeout=timeout)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 897, in _process_events
    self._process_trial(trial)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 924, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 787, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/worker.py", line 1713, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train_buffered() (pid=10211, ip=127.0.0.1, repr=<types.ImplicitFunc object at 0x7fc9c11c0640>)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/trainable.py", line 255, in train_buffered
    result = self.train()
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/trainable.py", line 314, in train
    result = self.step()
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/function_runner.py", line 381, in step
    self._report_thread_runner_error(block=True)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/function_runner.py", line 531, in _report_thread_runner_error
    raise TuneError(
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train_buffered() (pid=10211, ip=127.0.0.1, repr=<types.ImplicitFunc object at 0x7fc9c11c0640>)
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/function_runner.py", line 262, in run
    self._entrypoint()
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/function_runner.py", line 330, in entrypoint
    return self._trainable_func(self.config, self._status_reporter,
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/function_runner.py", line 597, in _trainable_func
    output = fn()
  File "/Users/jiaxuan/anaconda3/lib/python3.8/site-packages/ray/tune/utils/trainable.py", line 344, in inner
    trainable(config, **fn_kwargs)
  File "pbt_pl.py", line 182, in train_mnist_tune_checkpoint
    trainer.current_epoch = ckpt["epoch"]
AttributeError: can't set attribute

Versions / Dependencies

Ray: 1.9.2
Python: 3.8
OS: Mac & Ubuntu
Pytorch Lightning: 1.5.8 (latest)

Reproduction script

python ray/tune/examples/mnist_pytorch_lightning.py

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'ttriageNeeds triage (eg: priority, bug/not-bug, and owning component)tuneTune-related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions