-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
We've seen a deadlock in roslyn when performing a source-build of rc2 natively on ppc64le (using the Mono runtime). In a nutshell, it seems the logic behind this block code code in mono_runtime_class_init_full is not thread-safe:
/* see if the thread doing the initialization is already blocked on this thread */
gboolean is_blocked = TRUE;
blocked = GUINT_TO_POINTER (MONO_NATIVE_THREAD_ID_TO_UINT (lock->initializing_tid));
while ((pending_lock = (TypeInitializationLock*) g_hash_table_lookup (blocked_thread_hash, blocked))) {
if (mono_native_thread_id_equals (pending_lock->initializing_tid, tid)) {
if (!pending_lock->done) {
mono_type_initialization_unlock ();
goto return_true;
} else {
/* the thread doing the initialization is blocked on this thread,
but on a lock that has already been freed. It just hasn't got
time to awake */
is_blocked = FALSE;
break;
}
}
blocked = GUINT_TO_POINTER (MONO_NATIVE_THREAD_ID_TO_UINT (pending_lock->initializing_tid));
}
To trigger the race, it is necessary that two threads each are in mono_runtime_class_init_full twice. This can happen since mono_runtime_class_init_full performs mono_runtime_try_invoke, which executes some unknown managed code, which in turn can trigger a recursive mono_runtime_class_init_full call. If both threads try to initialize the same two classes X and Y, but in reverse order (i.e. thread A accesses Y in the .cctor of X while thread B accesses X in the .cctor of Y), we may end up in a deadlock.
To trigger the deadlock, it seems that a third class Z also has to be involved. That can lead to erroneous logic in the code above: Assume thread A is working both on X and (recursively) Z, and after Z is done tries to work on Y. But thread B has already started working on Y and is now recursively blocked on Z before then also requiring X. Since thread A sees that thread B is blocked on Z, which it itself has already completed, the code above triggers:
/* the thread doing the initialization is blocked on this thread,
but on a lock that has already been freed. It just hasn't got
time to awake */
However, while it is true that thread B will indeed get woken up from Z's lock, that does not mean it will successfully complete initialization of Y as the Y's .cctor also needs X - where thread B will get again blocked on thread A after all.
In the example we're seeing, those nested initializations happen on various instantiations of Microsoft.CodeAnalysis.VisualBasic.Symbols.OverrideHidingHelper, which has this .cctor:
Shared Sub New()
OverrideHidingHelper(Of MethodSymbol).s_runtimeSignatureComparer = MethodSignatureComparer.RuntimeMethodSignatureComparer
OverrideHidingHelper(Of PropertySymbol).s_runtimeSignatureComparer = PropertySignatureComparer.RuntimePropertySignatureComparer
OverrideHidingHelper(Of EventSymbol).s_runtimeSignatureComparer = EventSignatureComparer.RuntimeEventSignatureComparer
End Sub
A detailed timeline for a possible deadlock is as follows. (I cannot guarantee that this timeline is exacty what happened since I can only examine the deadlocked state. But it is a possibility how we could have gotten there.)
Thread A Thread B
mono_runtime_class_init_full (MonoVTable X) mono_runtime_class_init_full (MonoVTable Y)
mono_type_initialization_lock
lookup X in type_initialization_hash: n/a
allocate TypeInitializationLock LX
insert into type_initialization_hash: X -> LX
mono_type_initialization_unlock
mono_type_initialization_lock
lookup Y in type_initialization_hash: n/a
allocate TypeInitializationLock LY
insert into type_initialization_hash: Y -> LY
mono_type_initialization_unlock
mono_runtime_try_invoke (X .cctor) mono_runtime_try_invoke (Y .cctor)
... managed code ... ... managed code ...
mono_runtime_class_init_full (MonoVTable Z) mono_runtime_class_init_full (MonoVTable Z)
mono_type_initialization_lock
lookup Z in type_initialization_hash: n/a
allocate TypeInitializionLock LZ
insert into type_initialization_hash: Z -> LZ
mono_type_initialization_unlock
mono_type_initialization_lock
lookup Z in type_initialization_hash: LZ
LZ->initializing_tid is thread A
lookup A in blocked_thread_hash: n/a
insert into blocked_thread_hash: B -> LZ
mono_type_initialization_unlock
wait on LZ
[... complete initialization of Z ...]
LZ->done = TRUE;
wakeup waiters on LZ
mono_runtime_class_init_full (MonoVTable Z) returns
... managed code ...
mono_runtime_class_init_full (MonoVTable Y)
mono_type_initialization_lock
lookup Y in type_initialization_hash: LY
LY->initializing_tid is thread B
lookup B in blocked_thread_hash: LZ
LZ->initializing_tid is thread A
LZ->done is TRUE
/* the thread doing the initialization is blocked on this thread,
but on a lock that has already been freed. It just hasn't got
time to awake */
mono_type_initialization_unlock
wait on LY
wakeup from LZ
mono_type_initialization_lock
remove from blocked_thread_hash: B-> LZ
remove from type_initialization_hash: Z -> LZ
mono_type_initialization_unlock
mono_runtime_class_init_full (MonoVTable Z) returns
... managed code ...
mono_runtime_class_init_full (MonoVTable X)
mono_type_initialization_lock
lookup X in type_initialization_hash: LX
LX->initializing_tid is thread A
lookup A in blocked_thread_hash: n/a
insert into blocked_thread_hash: B -> LX
mono_type_initialization_unlock
wait on LX
CC - @directhex @lambdageek @vargaz @akoeplinger
FYI - @giritrivedi @alhad-deshpande @janani66 @omajid @tmds