-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20351] [ML] Add trait hasTrainingSummary to replace the duplicate code #17654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #75845 has finished for PR 17654 at commit
|
|
Test build #96194 has finished for PR 17654 at commit
|
|
Thanks for working on this, remove duplicated code is great. I'm curious as to why we couldn't remove some of the function calls to super and instead depend on inheritance? If it's the types on the setters could we add another type parameter of the model? |
| extends RegressionModel[Vector, GeneralizedLinearRegressionModel] | ||
| with GeneralizedLinearRegressionBase with MLWritable { | ||
| with GeneralizedLinearRegressionBase with MLWritable | ||
| with HasTrainingSummary[GeneralizedLinearRegressionTrainingSummary]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: space before braces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there still isn't a space here.
| override val numFeatures: Int = coefficients.size | ||
|
|
||
| private[ml] | ||
| override def setSummary(summary: Option[LinearRegressionTrainingSummary]): this.type = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, why do we need these definitions that just invoke the superclass method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
| * | ||
| * @tparam T Summary instance type | ||
| */ | ||
| @Since("2.3.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's target 3.0.0 at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
|
Gentle ping here, it's out of sync with master if you've got the time to bring it up to date that would be great. |
|
Ah, just see this. Will update it now. |
|
Test build #100138 has finished for PR 17654 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking fine except for a few nits and a question about whether to keep the overrides.
| /** Indicates whether a training summary exists for this model instance. */ | ||
| @Since("1.5.0") | ||
| def hasSummary: Boolean = trainingSummary.isDefined | ||
| override def summary: LinearRegressionTrainingSummary = super.summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, you don't need these overrides for this to work correctly, right? but I suppose it's necessary to preserve the @Since tag, which varies across implementations. But these were mostly introduced in 1.5.0, and where they have a later @Since tag, it matches when the class was introduced. I think it would also be coherent, for Spark 3.0, to remove these overrides, and mark the methods in the new trait as @Since 1.5.0. The result would be similar to what would happen if this had been introduced at the start. I don't feel strongly about it but what do you think? would clean up the code a little more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got an error message from Java side when removing summary
/home/yuhao/workspace/github/hhbyyh/spark/mllib/src/test/java/org/apache/spark/ml/classification/JavaLogisticRegressionSuite.java:145: error: incompatible types: Object cannot be converted to LogisticRegressionTrainingSummary
[error] LogisticRegressionTrainingSummary summary = model.summary();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK nevermind then. Thanks for checking.
|
|
||
| /** | ||
| * Gets summary of model on training set. An exception is | ||
| * thrown if `trainingSummary == None`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: from the callers perspective they don't know what trainingSummary is. "if hasSummary is false"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more nit @hhbyyh - can we change this one too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thanks for checking.
| extends RegressionModel[Vector, GeneralizedLinearRegressionModel] | ||
| with GeneralizedLinearRegressionBase with MLWritable { | ||
| with GeneralizedLinearRegressionBase with MLWritable | ||
| with HasTrainingSummary[GeneralizedLinearRegressionTrainingSummary]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there still isn't a space here.
|
Test build #100172 has finished for PR 17654 at commit
|
|
Test build #100210 has finished for PR 17654 at commit
|
|
Merged to master |
|
Thanks for the review. @srowen |
…te code ## What changes were proposed in this pull request? Add a trait HasTrainingSummary to avoid code duplicate related to training summary. Currently all the training summary use the similar pattern which can be generalized, ``` private[ml] final var trainingSummary: Option[T] = None def hasSummary: Boolean = trainingSummary.isDefined def summary: T = trainingSummary.getOrElse... private[ml] def setSummary(summary: Option[T]): ... ``` Classes with the trait need to override `setSummry`. And for Java compatibility, they will also have to override `summary` method, otherwise the java code will regard all the summary class as Object due to a known issue with Scala. ## How was this patch tested? existing Java and Scala unit tests Closes apache#17654 from hhbyyh/hassummary. Authored-by: Yuhao Yang <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…e in PySpark ## What changes were proposed in this pull request? Python version of apache#17654 ## How was this patch tested? Existing Python unit test Closes apache#23676 from huaxingao/spark26754. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…te code ## What changes were proposed in this pull request? Add a trait HasTrainingSummary to avoid code duplicate related to training summary. Currently all the training summary use the similar pattern which can be generalized, ``` private[ml] final var trainingSummary: Option[T] = None def hasSummary: Boolean = trainingSummary.isDefined def summary: T = trainingSummary.getOrElse... private[ml] def setSummary(summary: Option[T]): ... ``` Classes with the trait need to override `setSummry`. And for Java compatibility, they will also have to override `summary` method, otherwise the java code will regard all the summary class as Object due to a known issue with Scala. ## How was this patch tested? existing Java and Scala unit tests Closes apache#17654 from hhbyyh/hassummary. Authored-by: Yuhao Yang <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…e in PySpark ## What changes were proposed in this pull request? Python version of apache#17654 ## How was this patch tested? Existing Python unit test Closes apache#23676 from huaxingao/spark26754. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
Add a trait HasTrainingSummary to avoid code duplicate related to training summary.
Currently all the training summary use the similar pattern which can be generalized,
Classes with the trait need to override
setSummry. And for Java compatibility, they will also have to overridesummarymethod, otherwise the java code will regard all the summary class as Object due to a known issue with Scala.How was this patch tested?
existing Java and Scala unit tests