Skip to content

AggregateTrainingStopManager is trying to cancel disposed tokens #6416

@ericstj

Description

@ericstj

This failure is occuring in multiple PRs during AutoMLExperiment_throw_timeout_exception_when_ct_is_canceled_and_no_trial_completed_Async test

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-pull-6415-merge-bd52c1f30a6d4e1990/Microsoft.ML.AutoML.Tests/1/console.cd8a8fcf.log?helixlogtype=result

#6412 (comment)

Starting test: Microsoft.ML.AutoML.Test.AutoMLExperimentTests.AutoMLExperiment_throw_timeout_exception_when_ct_is_canceled_and_no_trial_completed_Async
Unhandled exception: System.AggregateException: One or more errors occurred. (The CancellationTokenSource has been disposed.)
 ---> System.ObjectDisposedException: The CancellationTokenSource has been disposed.
   at System.Threading.CancellationTokenSource.Cancel()
   at Microsoft.ML.AutoML.AutoMLExperiment.<>c__DisplayClass26_1.<RunAsync>g__handler|4(Object o, EventArgs e) in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/AutoMLExperiment.cs:line 270
   at Microsoft.ML.AutoML.AggregateTrainingStopManager.<.ctor>b__4_0(Object o, EventArgs e) in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/IStopTrainingManager.cs:line 129
   at Microsoft.ML.AutoML.TimeoutTrainingStopManager.<.ctor>b__5_0(Object o, EventArgs e) in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/IStopTrainingManager.cs:line 72
   at Microsoft.ML.AutoML.CancellationTokenStopTrainingManager.<.ctor>b__5_0() in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/IStopTrainingManager.cs:line 38
   at System.Threading.CancellationToken.<>c.<Register>b__12_0(Object obj)
   at System.Threading.CancellationTokenSource.CallbackNode.<>c.<ExecuteCallback>b__9_0(Object s)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
   --- End of inner exception stack trace ---
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
   at System.Threading.CancellationTokenSource.TimerCallback(Object state)
   at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
Unhandled exception. System.AggregateException: One or more errors occurred. (The CancellationTokenSource has been disposed.)
 ---> System.ObjectDisposedException: The CancellationTokenSource has been disposed.
   at System.Threading.CancellationTokenSource.Cancel()
   at Microsoft.ML.AutoML.AutoMLExperiment.<>c__DisplayClass26_1.<RunAsync>g__handler|4(Object o, EventArgs e) in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/AutoMLExperiment.cs:line 270
   at Microsoft.ML.AutoML.AggregateTrainingStopManager.<.ctor>b__4_0(Object o, EventArgs e) in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/IStopTrainingManager.cs:line 129
   at Microsoft.ML.AutoML.TimeoutTrainingStopManager.<.ctor>b__5_0(Object o, EventArgs e) in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/IStopTrainingManager.cs:line 72
   at Microsoft.ML.AutoML.CancellationTokenStopTrainingManager.<.ctor>b__5_0() in /__w/1/s/src/Microsoft.ML.AutoML/AutoMLExperiment/IStopTrainingManager.cs:line 38
   at System.Threading.CancellationToken.<>c.<Register>b__12_0(Object obj)
   at System.Threading.CancellationTokenSource.CallbackNode.<>c.<ExecuteCallback>b__9_0(Object s)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
   --- End of inner exception stack trace ---
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
   at System.Threading.CancellationTokenSource.TimerCallback(Object state)
   at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
Aborted (core dumped)

Bug appears to be here:

// only force-canceling running trials when there's completed trials.
// otherwise, wait for the current running trial to be completed.
if (_bestTrialResult != null)
trialCancellationTokenSource.Cancel();

I see that then handler is detached later in a finally statement. Perhaps there is a race condition?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions