Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@joecummings
Copy link
Member

Error

[[​cure_​business]​Torch​Script​Train](https://www.internalfb.com/fblearner/details/379462033/operator/4373394157?tab)Ran for 4 mins 9 s
[Hide logs](https://www.internalfb.com/intern/fblearner/details/379462033?tab=operator_details#)
Try #3

    [stderr](https://www.internalfb.com/intern/fblearner/details/379462033?tab=operator_details#)
    [stdout](https://www.internalfb.com/intern/fblearner/details/379462033?tab=operator_details#)

[Try #3](https://www.internalfb.com/intern/fblearner/details/379462033?tab=operator_details#)
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/tmp/jetter.zjtu55hv/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/tmp/jetter.zjtu55hv/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/tmp/jetter.zjtu55hv/fblearner/flow/projects/fluent2/definition/transformers/ecg/huggingface_transformers_4_6/optimization.py", line 368, in
 step
    loss = closure()
  File "/tmp/jetter.zjtu55hv/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
    closure_result = closure()
  File "/tmp/jetter.zjtu55hv/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/tmp/jetter.zjtu55hv/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
    step_output = self._step_fn()
  File "/tmp/jetter.zjtu55hv/pytorch_lightning/loops/optimization/optimizer_loop.py", line 422, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
  File "/tmp/jetter.zjtu55hv/pytorch_lightning/trainer/trainer.py", line 1752, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/tmp/jetter.zjtu55hv/pytorch_lightning/strategies/strategy.py", line 340, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/tmp/jetter.zjtu55hv/fblearner/flow/projects/fluent2/definition/transformers/ecg/ecg_two_tower.py", line 340, in training_step
    loss, _ = self.train_eval_batch(batch)
  File "/tmp/jetter.zjtu55hv/fblearner/flow/projects/fluent2/definition/transformers/ecg/ecg_two_tower.py", line 312, in train_eval_batch
    embeddings_a = self.model(**model_inputs_a)
  File "/tmp/jetter.zjtu55hv/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/jetter.zjtu55hv/fblearner/flow/projects/fluent2/definition/transformers/ecg/t5_sentence_embeddings.py", line 135, in forward
    model_output = self.model(
  File "/tmp/jetter.zjtu55hv/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/model.py", line 173, in forward
    encoder_output, encoder_hidden_states, encoder_position_bias, encoder_sa = self.encoder(
  File "/tmp/jetter.zjtu55hv/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/modules.py", line 865, in forward
    output, position_bias, sa_score = mod(
  File "/tmp/jetter.zjtu55hv/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/modules.py", line 616, in forward
    sa_out, position_bias, sa_scores = self._sa_block(self.norm1(x), tgt_mask, tgt_key_padding_mask, position_bias)
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/modules.py", line 630, in _sa_block
    attn = self.self_attn(
  File "/tmp/jetter.zjtu55hv/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/modules.py", line 132, in forward
    attn_output, position_bias, attn_output_weights = self._t5_multi_head_attention_forward(
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/modules.py", line 257, in _t5_multi_head_attention_forward
    position_bias = self._compute_bias(
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/modules.py", line 420, in _compute_bias
    relative_position_bucket = self._relative_position_bucket(
  File "/tmp/jetter.zjtu55hv/torchtext/prototype/models/t5/modules.py", line 454, in _relative_position_bucket
    relative_buckets += (relative_position > 0).to(torch.long) * num_buckets
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Fix & Context

relative_buckets is a Tensor created without specifying device, meaning it automatically gets put on CPU; however, the rest of the input Tensors are created with CUDA; therefore, there is a mismatch when attempting to do an arithmetic operation. Discovered when working on AI for CS workflow.

Testing

Fluent2 and Bento notebook; passes existing tests here. Do we have any Integration tests w/ CUDA we could run in OSS to check this?

Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @joecummings

@joecummings joecummings merged commit 4570a56 into pytorch:main Oct 17, 2022
@joecummings joecummings deleted the device-mismatch-butg branch October 17, 2022 17:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants