Skip to content

Commit 6420a8d

Browse files
kwachowsgregkh
authored andcommitted
accel/ivpu: Trigger device recovery on engine reset/resume failure
[ Upstream commit a47e36d ] Trigger full device recovery when the driver fails to restore device state via engine reset and resume operations. This is necessary because, even if submissions from a faulty context are blocked, the NPU may still process previously submitted faulty jobs if the engine reset fails to abort them. Such jobs can continue to generate faults and occupy device resources. When engine reset is ineffective, the only way to recover is to perform a full device recovery. Fixes: dad945c ("accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW") Cc: [email protected] # v6.15+ Signed-off-by: Karol Wachowski <[email protected]> Reviewed-by: Lizhi Hou <[email protected]> Signed-off-by: Jacek Lawrynowicz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
1 parent 397f3a7 commit 6420a8d

File tree

2 files changed

+11
-4
lines changed

2 files changed

+11
-4
lines changed

drivers/accel/ivpu/ivpu_job.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -849,7 +849,8 @@ void ivpu_context_abort_thread_handler(struct work_struct *work)
849849
unsigned long id;
850850

851851
if (vdev->fw->sched_mode == VPU_SCHEDULING_MODE_HW)
852-
ivpu_jsm_reset_engine(vdev, 0);
852+
if (ivpu_jsm_reset_engine(vdev, 0))
853+
return;
853854

854855
mutex_lock(&vdev->context_list_lock);
855856
xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
@@ -865,7 +866,8 @@ void ivpu_context_abort_thread_handler(struct work_struct *work)
865866
if (vdev->fw->sched_mode != VPU_SCHEDULING_MODE_HW)
866867
return;
867868

868-
ivpu_jsm_hws_resume_engine(vdev, 0);
869+
if (ivpu_jsm_hws_resume_engine(vdev, 0))
870+
return;
869871
/*
870872
* In hardware scheduling mode NPU already has stopped processing jobs
871873
* and won't send us any further notifications, thus we have to free job related resources

drivers/accel/ivpu/ivpu_jsm_msg.c

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include "ivpu_hw.h"
88
#include "ivpu_ipc.h"
99
#include "ivpu_jsm_msg.h"
10+
#include "ivpu_pm.h"
1011
#include "vpu_jsm_api.h"
1112

1213
const char *ivpu_jsm_msg_type_to_str(enum vpu_ipc_msg_type type)
@@ -163,8 +164,10 @@ int ivpu_jsm_reset_engine(struct ivpu_device *vdev, u32 engine)
163164

164165
ret = ivpu_ipc_send_receive(vdev, &req, VPU_JSM_MSG_ENGINE_RESET_DONE, &resp,
165166
VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm);
166-
if (ret)
167+
if (ret) {
167168
ivpu_err_ratelimited(vdev, "Failed to reset engine %d: %d\n", engine, ret);
169+
ivpu_pm_trigger_recovery(vdev, "Engine reset failed");
170+
}
168171

169172
return ret;
170173
}
@@ -354,8 +357,10 @@ int ivpu_jsm_hws_resume_engine(struct ivpu_device *vdev, u32 engine)
354357

355358
ret = ivpu_ipc_send_receive(vdev, &req, VPU_JSM_MSG_HWS_RESUME_ENGINE_DONE, &resp,
356359
VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm);
357-
if (ret)
360+
if (ret) {
358361
ivpu_err_ratelimited(vdev, "Failed to resume engine %d: %d\n", engine, ret);
362+
ivpu_pm_trigger_recovery(vdev, "Engine resume failed");
363+
}
359364

360365
return ret;
361366
}

0 commit comments

Comments
 (0)