Skip to content

Conversation

@jkbradley
Copy link
Member

What changes were proposed in this pull request?

  • Made SingularMatrixException private ml
  • WeightedLeastSquares: Changed to allow tol >= 0 instead of only tol > 0

How was this patch tested?

existing tests

@jkbradley
Copy link
Member Author

@sethah @yanboliang I just saw your PRs for SPARK-17748. Awesome change. I just saw a few nits along the way.

The only major item is making SingularMatrixException private ml. This would be the first public Exception type we have in MLlib, I believe. If it is to be public, I'd prefer it live in spark.ml.linalg, so we could move it there instead of making it private.

Also, I'm not a big fan of using Exceptions to handle logic. I'd prefer to change the use of CholeskySolver to pass the failure code, rather than relying on exceptions. If that sounds good to you, then I'll make a follow up JIRA for it.

Thanks!

@SparkQA
Copy link

SparkQA commented Nov 5, 2016

Test build #68174 has finished for PR 15779 at commit 2bee341.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor

sethah commented Nov 5, 2016

+1 on removing the use of exceptions. I thought it was a bit of an awkward solution to begin with. Thanks a lot for this pr, I will take a look soon.

Copy link
Contributor

@sethah sethah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the two comments.

* (singular).
*/
class SingularMatrixException(message: String, cause: Throwable)
private[ml] class SingularMatrixException(message: String, cause: Throwable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] as Jenkins was nice enough to point out

* The default value is "auto" which means that the solver algorithm is
* selected automatically.
*
* The Normal Equation solver is limited to a few thousand features; when needed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bit misleading? If I set solver = "normal" and my dataset contains >4096 features, an exception is thrown instead of automatically falling back to iterative methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @sethah It will only fall back when solver = "auto", we should clarify it more clear.

@yanboliang
Copy link
Contributor

@jkbradley @sethah The idea of using failure code rather than exception sounds good to me. I thought users could get the exception and infer the cause of failure from it. After rethinking it, I think it's an expert exception that few users would like to intercept it, so using failure code should make more sense. Thanks for the follow-up work.

@srowen
Copy link
Member

srowen commented Nov 7, 2016

I agree that exceptions can become, quite surprisingly, part of a public API. It has to be done with care. I am not sure I agree that failure codes are better; they're exceptions that don't use a language API and are even more ignorable. If it's simply status, that makes sense; if it's indicating failure, you want it to be hard-er to ignore.

@jkbradley
Copy link
Member Author

jkbradley commented Nov 8, 2016

I should have clarified: I meant for us to use the failure code internally. If the linear algebra ops become more public, then we may need to consider better ways to return result codes. I made a JIRA: [https://issues.apache.org/jira/browse/SPARK-18341]

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68312 has finished for PR 15779 at commit 72a9302.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the last nit, otherwise, LGTM. Thanks.

* features to at most this number.
*/
@Since("2.1.0")
val MAX_FEATURES_FOR_NORMAL_SOLVER: Int = WeightedLeastSquares.MAX_NUM_FEATURES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_NUM_FEATURES_FOR_NORMAL_SOLVER should be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the current name. MAX_NUM_.... is longer but not any clearer to me. I don't feel too strongly about it though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I left out "NUM_" for conciseness.


/**
* When using [[LinearRegression.solver]] == "normal", the solver must limit the number of
* features to at most this number.
Copy link
Contributor

@sethah sethah Nov 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion: perhaps an explanation as to why might be useful. "must limit the number of features to at most this number, because the entire covariance matrix X^T^X will be collected on the driver. This limit helps prevent memory overflow errors."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@sethah
Copy link
Contributor

sethah commented Nov 8, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68353 has finished for PR 15779 at commit 42ff98e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

Thanks! I'll merge this with master and branch-2.1

ueshin pushed a commit to ueshin/apache-spark that referenced this pull request Nov 8, 2016
…lastic net

## What changes were proposed in this pull request?

* Made SingularMatrixException private ml
* WeightedLeastSquares: Changed to allow tol >= 0 instead of only tol > 0

## How was this patch tested?

existing tests

Author: Joseph K. Bradley <[email protected]>

Closes apache#15779 from jkbradley/wls-cleanups.

(cherry picked from commit 26e1c53)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in 26e1c53 Nov 8, 2016
@jkbradley
Copy link
Member Author

Linking commit for branch-2.1 since it hasn't linked automatically: [https://github.com/apache/spark/commit/21bbf94b41fbd193e370a3820131e449aaf0e3db]

@jkbradley jkbradley deleted the wls-cleanups branch November 8, 2016 21:39
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…lastic net

## What changes were proposed in this pull request?

* Made SingularMatrixException private ml
* WeightedLeastSquares: Changed to allow tol >= 0 instead of only tol > 0

## How was this patch tested?

existing tests

Author: Joseph K. Bradley <[email protected]>

Closes apache#15779 from jkbradley/wls-cleanups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants