Skip to content

Documentation request: test data + model order #25091

@luisquintanilla

Description

@luisquintanilla

Opening issue to address questions on data splits and iterations posted on dotnet/machinelearning-modelbuilder repo.

Original Issue Description:

Is your feature request related to a problem? Please describe.
I am concerned about how to evaluate the performance of the models as the documentation does not clearly explain how the > test data is split. I also would like to estimate better how long to run iterations if I have already experienced with a related data set.

Describe the solution you'd like
I suggest adding to documentation:

how is test data split. E.g. is it from the end, randomly or based on auto-selection of columns?
how are the iterations generated. It would seem they might be in fixed order, or my data is fairly similar and leads to it. For
example, if iteration 201 performed best, then when I re-train I could stop after iteration 201 next time, if I know this. If they are not in fixed order, then I know to let my next dataset run a bit longer.

For more details, see dotnet/machinelearning-modelbuilder#1530


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions