You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During unload of rds_rdma on older kernels, we sometimes observe the
following general protection fault (slightly edited for better
brevity):
Unregistered RDS/infiniband transport
general protection fault: 0000 [#1] SMP PTI
5.4.17-2136.330.1.oldis_combo.el8uek.v01.x86_64 #2
Workqueue: krds_cp_wq#470/0 rds_up_or_down_worker [rds]
RIP: 0010:rds_up_or_down_worker+0x54/0x2e0 [rds]
Call Trace:
process_one_work+0x1bb/0x3a9
worker_thread+0x37/0x3b2
kthread+0x120/0x136
ret_from_fork+0x2b/0x36
Using the newer uek-7-u3 kernels, v5.15.0-308.179.6.11 and
v5.15.0-311.185.2, the bug manifests itself as:
BUG: kernel NULL pointer dereference, address: 0000000000000500
Workqueue: krds_cp_wq#97/0 rds_up_or_down_worker [rds]
RIP: 0010:xas_start+0x22/0xf0
Call Trace:
xas_load+0x8/0x91
xa_load+0x52/0x95
rds_ib_get_client_data+0x17/0x30 [rds_rdma]
rds_ib_setup_qp+0x67/0xa10 [rds_rdma]
rds_ib_cm_accept+0x105/0x360 [rds_rdma]
rds_ib_conn_path_connect+0x1e1/0x650 [rds_rdma]
rds_up_or_down_worker+0x1ff/0x280 [rds]
process_one_work+0x1ee/0x3c6
worker_thread+0x53/0x3e4
kthread+0x127/0x144
? set_kthread_struct+0x60/0x52
ret_from_fork+0x1f/0x2d
We fix this by not re-queuing the reconnect_worker, if we are in the
process of tearing the module down.
Orabug: 38169301
Fixes: ad3b8a5 ("net/rds: serialize up+down-work to relax strict ordering")
Signed-off-by: Håkon Bugge <[email protected]>
Tested-by: John Fitzgerald <[email protected]>
Tested-by: Håkon Bugge <[email protected]>
Reviewed-by: Sharath Srinivasan <[email protected]>
0 commit comments