-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[ML][Minor] Separate estimator and model params for read/write test. #17151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #73846 has finished for PR 17151 at commit
|
|
Test build #73847 has finished for PR 17151 at commit
|
| val categoricalData: DataFrame = | ||
| TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2) | ||
| testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData) | ||
| testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seriously reduce the amount of code changed (and therefore make this much easier to review :p ) to just add an extra constructor:
def testEstimatorAndModelReadWrite[
E <: Estimator[M] with MLWritable, M <: Model[M] with MLWritable](
estimator: E,
dataset: Dataset[_],
testParams: Map[String, Any],
checkModelData: (M, M) => Unit): Unit = {
testEstimatorAndModelReadWrite(estimator, dataset, testParams, testParams, checkModelData)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethah Thanks for your kindly remind. I was planing to write as your suggestion before I'm thinking:
- Actually all
Modelonly need to extends a fraction ofParamsfromEstimator, so alltestEstimatorAndModelReadWrite(estimator, dataset, testParams, checkModelData)should be changed totestEstimatorAndModelReadWrite(estimator, dataset, testEstimatorParams, testModelParams, checkModelData)eventually. I explicitly write with the later way in test suites to remind developers should separate their estimator and model params when adding new algorithms' read/write test. I'm afraid that developers are not aware of the separation if they refer other test suites and find almost all test cases only pass intestParams. - Though in the currently change, we pass in
allParamSettingsto bothtestEstimatorParamsandtestModelParams, this is because they share the same params set. Others likeALSwill be pass in separate params. I think we should push forward to refactor***and***Modelto separate their params, which could make models more succinct. - If this is a public API, I totally agree with you. However, this is an internal auxiliary function, I think all test cases will need to pass in separate params eventually, so I settle a matter at one go.
This is my two cents, I'm still open to hear your thoughts. If you have strongly opinion, I can update according your suggestion. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I still think it's better to just add the extra constructor, but I don't feel strongly about it. So we can proceed with whatever you feel is best. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to let any refactoring of these tests happen as-needed. If there are specific cases that need to be done now, we should create JIRAs to track them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually almost all models are extends lots of params which are not necessary, I'd like to remove these params for models as todo list. I'll create JIRAs to track them. I'll merge this first since it blocks #17117. Thanks.
sethah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| val categoricalData: DataFrame = | ||
| TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2) | ||
| testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData) | ||
| testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to let any refactoring of these tests happen as-needed. If there are specific cases that need to be done now, we should create JIRAs to track them.
|
Merged into master. Thanks for reviewing. |
What changes were proposed in this pull request?
Since we allow
EstimatorandModelnot always share same params (seeALSParamsandALSModelParams), we should pass in test params for estimator and model separately in functiontestEstimatorAndModelReadWrite.How was this patch tested?
Existing tests.