Skip to content

Commit 5a7ae7b

Browse files
jkrzyszt-intelgregkh
authored andcommitted
drm/i915/gt: Fix timeline left held on VMA alloc error
[ Upstream commit a5aa7bc ] The following error has been reported sporadically by CI when a test unbinds the i915 driver on a ring submission platform: <4> [239.330153] ------------[ cut here ]------------ <4> [239.330166] i915 0000:00:02.0: [drm] drm_WARN_ON(dev_priv->mm.shrink_count) <4> [239.330196] WARNING: CPU: 1 PID: 18570 at drivers/gpu/drm/i915/i915_gem.c:1309 i915_gem_cleanup_early+0x13e/0x150 [i915] ... <4> [239.330640] RIP: 0010:i915_gem_cleanup_early+0x13e/0x150 [i915] ... <4> [239.330942] Call Trace: <4> [239.330944] <TASK> <4> [239.330949] i915_driver_late_release+0x2b/0xa0 [i915] <4> [239.331202] i915_driver_release+0x86/0xa0 [i915] <4> [239.331482] devm_drm_dev_init_release+0x61/0x90 <4> [239.331494] devm_action_release+0x15/0x30 <4> [239.331504] release_nodes+0x3d/0x120 <4> [239.331517] devres_release_all+0x96/0xd0 <4> [239.331533] device_unbind_cleanup+0x12/0x80 <4> [239.331543] device_release_driver_internal+0x23a/0x280 <4> [239.331550] ? bus_find_device+0xa5/0xe0 <4> [239.331563] device_driver_detach+0x14/0x20 ... <4> [357.719679] ---[ end trace 0000000000000000 ]--- If the test also unloads the i915 module then that's followed with: <3> [357.787478] ============================================================================= <3> [357.788006] BUG i915_vma (Tainted: G U W N ): Objects remaining on __kmem_cache_shutdown() <3> [357.788031] ----------------------------------------------------------------------------- <3> [357.788204] Object 0xffff888109e7f480 @offset=29824 <3> [357.788670] Allocated in i915_vma_instance+0xee/0xc10 [i915] age=292729 cpu=4 pid=2244 <4> [357.788994] i915_vma_instance+0xee/0xc10 [i915] <4> [357.789290] init_status_page+0x7b/0x420 [i915] <4> [357.789532] intel_engines_init+0x1d8/0x980 [i915] <4> [357.789772] intel_gt_init+0x175/0x450 [i915] <4> [357.790014] i915_gem_init+0x113/0x340 [i915] <4> [357.790281] i915_driver_probe+0x847/0xed0 [i915] <4> [357.790504] i915_pci_probe+0xe6/0x220 [i915] ... Closer analysis of CI results history has revealed a dependency of the error on a few IGT tests, namely: - igt@api_intel_allocator@fork-simple-stress-signal, - igt@api_intel_allocator@two-level-inception-interruptible, - igt@gem_linear_blits@interruptible, - igt@prime_mmap_coherency@ioctl-errors, which invisibly trigger the issue, then exhibited with first driver unbind attempt. All of the above tests perform actions which are actively interrupted with signals. Further debugging has allowed to narrow that scope down to DRM_IOCTL_I915_GEM_EXECBUFFER2, and ring_context_alloc(), specific to ring submission, in particular. If successful then that function, or its execlists or GuC submission equivalent, is supposed to be called only once per GEM context engine, followed by raise of a flag that prevents the function from being called again. The function is expected to unwind its internal errors itself, so it may be safely called once more after it returns an error. In case of ring submission, the function first gets a reference to the engine's legacy timeline and then allocates a VMA. If the VMA allocation fails, e.g. when i915_vma_instance() called from inside is interrupted with a signal, then ring_context_alloc() fails, leaving the timeline held referenced. On next I915_GEM_EXECBUFFER2 IOCTL, another reference to the timeline is got, and only that last one is put on successful completion. As a consequence, the legacy timeline, with its underlying engine status page's VMA object, is still held and not released on driver unbind. Get the legacy timeline only after successful allocation of the context engine's VMA. v2: Add a note on other submission methods (Krzysztof Karas): Both execlists and GuC submission use lrc_alloc() which seems free from a similar issue. Fixes: 75d0a7f ("drm/i915: Lift timeline into intel_context") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12061 Cc: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Krzysztof Karas <[email protected]> Reviewed-by: Sebastian Brzezinka <[email protected]> Reviewed-by: Krzysztof Niemiec <[email protected]> Signed-off-by: Janusz Krzysztofik <[email protected]> Reviewed-by: Nitin Gote <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Signed-off-by: Andi Shyti <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit cc43422b3cc79eacff4c5a8ba0d224688ca9dd4f) Signed-off-by: Joonas Lahtinen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent 510a609 commit 5a7ae7b

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

drivers/gpu/drm/i915/gt/intel_ring_submission.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -575,7 +575,6 @@ static int ring_context_alloc(struct intel_context *ce)
575575
/* One ringbuffer to rule them all */
576576
GEM_BUG_ON(!engine->legacy.ring);
577577
ce->ring = engine->legacy.ring;
578-
ce->timeline = intel_timeline_get(engine->legacy.timeline);
579578

580579
GEM_BUG_ON(ce->state);
581580
if (engine->context_size) {
@@ -588,6 +587,8 @@ static int ring_context_alloc(struct intel_context *ce)
588587
ce->state = vma;
589588
}
590589

590+
ce->timeline = intel_timeline_get(engine->legacy.timeline);
591+
591592
return 0;
592593
}
593594

0 commit comments

Comments
 (0)