Skip to content

Commit 8467278

Browse files
xuejiufeisashalevin
authored andcommitted
ocfs2/dlm: ignore cleaning the migration mle that is inuse
[ Upstream commit bef5502 ] We have found that migration source will trigger a BUG that the refcount of mle is already zero before put when the target is down during migration. The situation is as follows: dlm_migrate_lockres dlm_add_migration_mle dlm_mark_lockres_migrating dlm_get_mle_inuse <<<<<< Now the refcount of the mle is 2. dlm_send_one_lockres and wait for the target to become the new master. <<<<<< o2hb detect the target down and clean the migration mle. Now the refcount is 1. dlm_migrate_lockres woken, and put the mle twice when found the target goes down which trigger the BUG with the following message: "ERROR: bad mle: ". Signed-off-by: Jiufei Xue <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent c91d339 commit 8467278

File tree

1 file changed

+15
-11
lines changed

1 file changed

+15
-11
lines changed

fs/ocfs2/dlm/dlmmaster.c

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2518,6 +2518,11 @@ static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
25182518
spin_lock(&dlm->master_lock);
25192519
ret = dlm_add_migration_mle(dlm, res, mle, &oldmle, name,
25202520
namelen, target, dlm->node_num);
2521+
/* get an extra reference on the mle.
2522+
* otherwise the assert_master from the new
2523+
* master will destroy this.
2524+
*/
2525+
dlm_get_mle_inuse(mle);
25212526
spin_unlock(&dlm->master_lock);
25222527
spin_unlock(&dlm->spinlock);
25232528

@@ -2553,6 +2558,7 @@ static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
25532558
if (mle_added) {
25542559
dlm_mle_detach_hb_events(dlm, mle);
25552560
dlm_put_mle(mle);
2561+
dlm_put_mle_inuse(mle);
25562562
} else if (mle) {
25572563
kmem_cache_free(dlm_mle_cache, mle);
25582564
mle = NULL;
@@ -2570,17 +2576,6 @@ static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
25702576
* ensure that all assert_master work is flushed. */
25712577
flush_workqueue(dlm->dlm_worker);
25722578

2573-
/* get an extra reference on the mle.
2574-
* otherwise the assert_master from the new
2575-
* master will destroy this.
2576-
* also, make sure that all callers of dlm_get_mle
2577-
* take both dlm->spinlock and dlm->master_lock */
2578-
spin_lock(&dlm->spinlock);
2579-
spin_lock(&dlm->master_lock);
2580-
dlm_get_mle_inuse(mle);
2581-
spin_unlock(&dlm->master_lock);
2582-
spin_unlock(&dlm->spinlock);
2583-
25842579
/* notify new node and send all lock state */
25852580
/* call send_one_lockres with migration flag.
25862581
* this serves as notice to the target node that a
@@ -3309,6 +3304,15 @@ void dlm_clean_master_list(struct dlm_ctxt *dlm, u8 dead_node)
33093304
mle->new_master != dead_node)
33103305
continue;
33113306

3307+
if (mle->new_master == dead_node && mle->inuse) {
3308+
mlog(ML_NOTICE, "%s: target %u died during "
3309+
"migration from %u, the MLE is "
3310+
"still keep used, ignore it!\n",
3311+
dlm->name, dead_node,
3312+
mle->master);
3313+
continue;
3314+
}
3315+
33123316
/* If we have reached this point, this mle needs to be
33133317
* removed from the list and freed. */
33143318
dlm_clean_migration_mle(dlm, mle);

0 commit comments

Comments
 (0)