Skip to content

[bug]: RuntimeError: CUDA error: device-side assert triggered #1992

@TheBarret

Description

@TheBarret

Is there an existing issue for this?

  • I have searched the existing issues

OS

Windows

GPU

cuda

VRAM

4GB

What happened?

Making a new iteration on a custom model, using the k_euler_a and k_dpmpp_2_a samplers, length of prompt ~476 characters.
The error states the prompt is too big, but i have used this prompt before without problems.
I have updated to the last InvokeAI version 2.2.4, i did this using the manual git pull method, and then running the reconfigure script.

Startup command: python scripts/invoke.py --web --no-nsfw_checker --model swpunk

>> Setting Sampler to k_euler_a
>> Prompt is 6 token(s) too long and has been truncated
>> Prompt is 2 token(s) too long and has been truncated
Generating:   0%|                                                                                | 0/1 [00:00<?, ?it/s]>> Ksampler using model noise schedule (steps >= 30)
>> Sampling with k_euler_ancestral starting at step 0 of 32 (32 new sampling steps)
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [5,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [67,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [68,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [18,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [18,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [18,0,0], thread: [98,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [10,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [10,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [98,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [99,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [100,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

These errors go on for 1308 lines in total.
And then this following exception is thrown:

Traceback (most recent call last):
  File "d:\ai\invokeai\ldm\generate.py", line 492, in prompt2image
    results = generator.generate(
  File "d:\ai\invokeai\ldm\invoke\generator\base.py", line 98, in generate
    image = make_image(x_T)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "d:\ai\invokeai\ldm\invoke\generator\txt2img.py", line 42, in make_image
    samples, _ = sampler.sample(
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ksampler.py", line 226, in sample
    K.sampling.__dict__[f'sample_{self.schedule}'](
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ksampler.py", line 52, in forward
    next_x = self.invokeai_diffuser.do_diffusion_step(x, sigma, uncond, cond, cond_scale)
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 107, in do_diffusion_step
    unconditioned_next_x, conditioned_next_x = self.apply_standard_conditioning(x, sigma, unconditioning, conditioning)
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 123, in apply_standard_conditioning
    unconditioned_next_x, conditioned_next_x = self.model_forward_callback(x_twice, sigma_twice,
  File "d:\ai\invokeai\ldm\models\diffusion\ksampler.py", line 38, in <lambda>
    model_forward_callback=lambda x, sigma, cond: self.inner_model(x, sigma, cond=cond))
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\k_diffusion\external.py", line 114, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\k_diffusion\external.py", line 140, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ddpm.py", line 1441, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ddpm.py", line 2167, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\diffusionmodules\openaimodel.py", line 806, in forward
    h = module(h, emb, context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\diffusionmodules\openaimodel.py", line 88, in forward
    x = layer(x, context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 271, in forward
    x = block(x, context=context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 221, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "d:\ai\invokeai\ldm\modules\diffusionmodules\util.py", line 159, in checkpoint
    return func(*inputs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 226, in _forward
    x += self.attn2(self.norm2(x.clone()), context=context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 199, in forward
    r = self.get_invokeai_attention_mem_efficient(q, k, v)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 291, in get_invokeai_attention_mem_efficient
    return self.einsum_op_cuda(q, k, v)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 285, in einsum_op_cuda
    return self.einsum_op_tensor_mem(q, k, v, mem_free_total / 3.3 / (1 << 20))
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 264, in einsum_op_tensor_mem
    return self.einsum_lowest_level(q, k, v, None, None, None)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 229, in einsum_lowest_level
    self.attention_slice_calculated_callback(attention_slice, dim, offset, slice_size)
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 69, in <lambda>
    lambda slice, dim, offset, slice_size, key=key: callback(slice, dim, offset, slice_size, key))
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 61, in callback
    saver.add_attention_maps(slice, key)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_map_saving.py", line 39, in add_attention_maps
    self.collated_maps[key_and_size] += maps.cpu()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

>> Could not generate image.

Screenshots

No response

Additional context

No response

Contact Details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions