Skip to content

Commit bef5502

Browse files
xuejiufeitorvalds
authored andcommitted
ocfs2/dlm: ignore cleaning the migration mle that is inuse
We have found that migration source will trigger a BUG that the refcount of mle is already zero before put when the target is down during migration. The situation is as follows: dlm_migrate_lockres dlm_add_migration_mle dlm_mark_lockres_migrating dlm_get_mle_inuse <<<<<< Now the refcount of the mle is 2. dlm_send_one_lockres and wait for the target to become the new master. <<<<<< o2hb detect the target down and clean the migration mle. Now the refcount is 1. dlm_migrate_lockres woken, and put the mle twice when found the target goes down which trigger the BUG with the following message: "ERROR: bad mle: ". Signed-off-by: Jiufei Xue <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 1cce4df commit bef5502

File tree

1 file changed

+15
-11
lines changed

1 file changed

+15
-11
lines changed

fs/ocfs2/dlm/dlmmaster.c

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2519,6 +2519,11 @@ static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
25192519
spin_lock(&dlm->master_lock);
25202520
ret = dlm_add_migration_mle(dlm, res, mle, &oldmle, name,
25212521
namelen, target, dlm->node_num);
2522+
/* get an extra reference on the mle.
2523+
* otherwise the assert_master from the new
2524+
* master will destroy this.
2525+
*/
2526+
dlm_get_mle_inuse(mle);
25222527
spin_unlock(&dlm->master_lock);
25232528
spin_unlock(&dlm->spinlock);
25242529

@@ -2554,6 +2559,7 @@ static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
25542559
if (mle_added) {
25552560
dlm_mle_detach_hb_events(dlm, mle);
25562561
dlm_put_mle(mle);
2562+
dlm_put_mle_inuse(mle);
25572563
} else if (mle) {
25582564
kmem_cache_free(dlm_mle_cache, mle);
25592565
mle = NULL;
@@ -2571,17 +2577,6 @@ static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
25712577
* ensure that all assert_master work is flushed. */
25722578
flush_workqueue(dlm->dlm_worker);
25732579

2574-
/* get an extra reference on the mle.
2575-
* otherwise the assert_master from the new
2576-
* master will destroy this.
2577-
* also, make sure that all callers of dlm_get_mle
2578-
* take both dlm->spinlock and dlm->master_lock */
2579-
spin_lock(&dlm->spinlock);
2580-
spin_lock(&dlm->master_lock);
2581-
dlm_get_mle_inuse(mle);
2582-
spin_unlock(&dlm->master_lock);
2583-
spin_unlock(&dlm->spinlock);
2584-
25852580
/* notify new node and send all lock state */
25862581
/* call send_one_lockres with migration flag.
25872582
* this serves as notice to the target node that a
@@ -3312,6 +3307,15 @@ void dlm_clean_master_list(struct dlm_ctxt *dlm, u8 dead_node)
33123307
mle->new_master != dead_node)
33133308
continue;
33143309

3310+
if (mle->new_master == dead_node && mle->inuse) {
3311+
mlog(ML_NOTICE, "%s: target %u died during "
3312+
"migration from %u, the MLE is "
3313+
"still keep used, ignore it!\n",
3314+
dlm->name, dead_node,
3315+
mle->master);
3316+
continue;
3317+
}
3318+
33153319
/* If we have reached this point, this mle needs to be
33163320
* removed from the list and freed. */
33173321
dlm_clean_migration_mle(dlm, mle);

0 commit comments

Comments
 (0)