-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Let's make the error message more actionable.
I would recommend adding similar named column(s):
- $"Provided {columnPurpose} column '{columnName}' not found in training data."
+ $"Provided {columnPurpose} column '{columnName}' not found in training data. Did you mean '{closestNamed}'."For my current example, this would print: Provided ignored column 'tagMaxTotalItem' not found in training data. Did you mean 'tagMaxTotalItems'.
I'd recommend using Levenshtein distance to find the closest named column (code).
Code location:
machinelearning/src/Microsoft.ML.AutoML/Utils/UserInputValidationUtil.cs
Lines 248 to 252 in 5dbfd8a
| var nullableColumn = trainData.Schema.GetColumnOrNull(columnName); | |
| if (nullableColumn == null) | |
| { | |
| throw new ArgumentException($"Provided {columnPurpose} column '{columnName}' not found in training data."); | |
| } |
Background:
It took me ~20min to debug why this error was occurring (obvious in retrospect). My column existed in the dataset, it existed in my loader function, it existed in my IDataView, ...; simply was just misspelt ("tagMaxTotalItem" instead of "tagMaxTotalItems").
Improving the usability of this error message will save future users' time.
