Skip to content

Conversation

@hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Apr 16, 2017

What changes were proposed in this pull request?

Add a trait HasTrainingSummary to avoid code duplicate related to training summary.

Currently all the training summary use the similar pattern which can be generalized,


  private[ml] final var trainingSummary: Option[T] = None

  def hasSummary: Boolean = trainingSummary.isDefined

  def summary: T = trainingSummary.getOrElse...

  private[ml] def setSummary(summary: Option[T]): ...

Classes with the trait need to override setSummry. And for Java compatibility, they will also have to override summary method, otherwise the java code will regard all the summary class as Object due to a known issue with Scala.

How was this patch tested?

existing Java and Scala unit tests

@SparkQA
Copy link

SparkQA commented Apr 17, 2017

Test build #75845 has finished for PR 17654 at commit 3bca3b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2018

Test build #96194 has finished for PR 17654 at commit 3bca3b1.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Sep 27, 2018

Thanks for working on this, remove duplicated code is great. I'm curious as to why we couldn't remove some of the function calls to super and instead depend on inheritance?

If it's the types on the setters could we add another type parameter of the model?

extends RegressionModel[Vector, GeneralizedLinearRegressionModel]
with GeneralizedLinearRegressionBase with MLWritable {
with GeneralizedLinearRegressionBase with MLWritable
with HasTrainingSummary[GeneralizedLinearRegressionTrainingSummary]{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space before braces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there still isn't a space here.

override val numFeatures: Int = coefficients.size

private[ml]
override def setSummary(summary: Option[LinearRegressionTrainingSummary]): this.type =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, why do we need these definitions that just invoke the superclass method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

*
* @tparam T Summary instance type
*/
@Since("2.3.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's target 3.0.0 at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@holdenk
Copy link
Contributor

holdenk commented Dec 7, 2018

Gentle ping here, it's out of sync with master if you've got the time to bring it up to date that would be great.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Dec 13, 2018

Ah, just see this. Will update it now.

@SparkQA
Copy link

SparkQA commented Dec 14, 2018

Test build #100138 has finished for PR 17654 at commit ecef1d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking fine except for a few nits and a question about whether to keep the overrides.

/** Indicates whether a training summary exists for this model instance. */
@Since("1.5.0")
def hasSummary: Boolean = trainingSummary.isDefined
override def summary: LinearRegressionTrainingSummary = super.summary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, you don't need these overrides for this to work correctly, right? but I suppose it's necessary to preserve the @Since tag, which varies across implementations. But these were mostly introduced in 1.5.0, and where they have a later @Since tag, it matches when the class was introduced. I think it would also be coherent, for Spark 3.0, to remove these overrides, and mark the methods in the new trait as @Since 1.5.0. The result would be similar to what would happen if this had been introduced at the start. I don't feel strongly about it but what do you think? would clean up the code a little more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an error message from Java side when removing summary

 /home/yuhao/workspace/github/hhbyyh/spark/mllib/src/test/java/org/apache/spark/ml/classification/JavaLogisticRegressionSuite.java:145: error: incompatible types: Object cannot be converted to LogisticRegressionTrainingSummary
[error]     LogisticRegressionTrainingSummary summary = model.summary();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK nevermind then. Thanks for checking.


/**
* Gets summary of model on training set. An exception is
* thrown if `trainingSummary == None`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: from the callers perspective they don't know what trainingSummary is. "if hasSummary is false"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit @hhbyyh - can we change this one too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks for checking.

extends RegressionModel[Vector, GeneralizedLinearRegressionModel]
with GeneralizedLinearRegressionBase with MLWritable {
with GeneralizedLinearRegressionBase with MLWritable
with HasTrainingSummary[GeneralizedLinearRegressionTrainingSummary]{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there still isn't a space here.

@SparkQA
Copy link

SparkQA commented Dec 15, 2018

Test build #100172 has finished for PR 17654 at commit 79ae7fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100210 has finished for PR 17654 at commit 11ba5f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 17, 2018

Merged to master

@srowen srowen closed this in c04ad17 Dec 17, 2018
@hhbyyh
Copy link
Contributor Author

hhbyyh commented Dec 17, 2018

Thanks for the review. @srowen

holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…te code

## What changes were proposed in this pull request?

Add a trait HasTrainingSummary to avoid code duplicate related to training summary.

Currently all the training summary use the similar pattern which can be generalized,

```

  private[ml] final var trainingSummary: Option[T] = None

  def hasSummary: Boolean = trainingSummary.isDefined

  def summary: T = trainingSummary.getOrElse...

  private[ml] def setSummary(summary: Option[T]): ...

```

Classes with the trait need to override `setSummry`. And for Java compatibility, they will also have to override `summary` method, otherwise the java code will regard all the summary class as Object due to a known issue with Scala.

## How was this patch tested?

existing Java and Scala unit tests

Closes apache#17654 from hhbyyh/hassummary.

Authored-by: Yuhao Yang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
rongou pushed a commit to rongou/spark that referenced this pull request Feb 1, 2019
…e in PySpark

## What changes were proposed in this pull request?

Python version of apache#17654

## How was this patch tested?

Existing Python unit test

Closes apache#23676 from huaxingao/spark26754.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…te code

## What changes were proposed in this pull request?

Add a trait HasTrainingSummary to avoid code duplicate related to training summary.

Currently all the training summary use the similar pattern which can be generalized,

```

  private[ml] final var trainingSummary: Option[T] = None

  def hasSummary: Boolean = trainingSummary.isDefined

  def summary: T = trainingSummary.getOrElse...

  private[ml] def setSummary(summary: Option[T]): ...

```

Classes with the trait need to override `setSummry`. And for Java compatibility, they will also have to override `summary` method, otherwise the java code will regard all the summary class as Object due to a known issue with Scala.

## How was this patch tested?

existing Java and Scala unit tests

Closes apache#17654 from hhbyyh/hassummary.

Authored-by: Yuhao Yang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…e in PySpark

## What changes were proposed in this pull request?

Python version of apache#17654

## How was this patch tested?

Existing Python unit test

Closes apache#23676 from huaxingao/spark26754.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants