As laid out in #2498 , we need scenarios to cover the Evaluation functionality we want fully supported in V1.
- I can evaluate a model trained for any of my tasks on test data. The evaluation outputs metrics that are relevant to the task (e.g. AUC, accuracy, P/R, and F1 for binary classification)
- I can get the data that will allow me to plot PR curves