-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17748][FOLLOW-UP][ML] Reorg variables of WeightedLeastSquares. #15621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sethah I wish you would not mind, I made some reorg for variables of |
| } | ||
|
|
||
| /** Construct A^T^ A from summary statistics. */ | ||
| /** Construct A^T^ A (append bias if necessary). */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary statistics are in original space, here we construct A^T^ A from standardized space.
|
Test build #67506 has finished for PR 15621 at commit
|
| // scale aBar to standardized space in-place | ||
| while (i < numFeatures) { | ||
| if (aStd(i) == 0.0) { | ||
| _aBar.values(i) = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General suggestion, the following two are not equivalent:
while (j < numFeatures) {
_abar.values(i) /= aStd(i)
}val _abarValues = _abar.values
while (j < numFeatures) {
_abarValues(i) /= aStd(i)
}This is because of how scala compiles class member vals, you are actually accessing a getter method when you call _abar.values. For speed, I think we should revert this back to making explicit pointers to the values. In this case, this probably isn't a big deal and likely wouldn't matter for anything but aaBar since it's k by k (roughly) in dimension. Still, I don't think it's a bad idea. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I total agree. I thought the impact of this issue is trivial, but making implicit pointer definitely should be better. I made update for all variables. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're probably right. The only thing it might have a real impact is aaBar, but since we use while loops for speed, it seems better to also use this optimization as well.
| } | ||
| j += 1 | ||
| } | ||
| val bBar = summary.bBar / bStd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the names bBarStd and bbBarStd here, as I think they're more descriptive. But it is not a strong preference so I'll leave it up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we use bBar, bbBar, aBar, abBar, aaBar in standardized space always, so I did not append Std as suffix for all variables. If we only add suffix for bBar and bbBar, developers may misinterpret that other variables are not in standardized space.
| aaBarValues(p) = 0.0 | ||
| // scale abBar to standardized space in-place | ||
| while (i < numFeatures) { | ||
| if (aStd(i) == 0.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar thing here, you're calling aStd.apply(i) which in turn calls an internal getter for aStd.values which then indexes the array.
sethah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor thing, otherwise LGTM.
| j += 1 | ||
| _aBar | ||
| } | ||
| val aBarValues = aBar.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just make the above be val aBarValues = { ... } returning _aBarValues from the code block instead of _abar. Since we make a pointer to aBar but then only use it to get its values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pass aBar to NormalEquationSolver.solve(), so we can't remove the declaration of aBar. Or other we can convert the argument type of NormalEquationSolver.solve() from DenseVector to Array, but we need to wrap Array to DenseVector and pass into BLAS operation inside of solve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good point. Still, for aaBar and abBar I think we can do it. We can leave aBar as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep their types consistent at interface level, and the following two scenarios:
def solve(
bBar: Double,
bbBar: Double,
abBar: DenseVector,
aaBar: DenseVector,
aBarValues: Array[Double]): NormalEquationSolutionand
def solve(
bBar: Double,
bbBar: Double,
abBarValues: Array[Double],
aaBarValues: Array[Double],
aBar: DenseVector): NormalEquationSolutionare not good. So I left them as it is. Is this OK for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to change the solve signature to intermix arrays and vectors since we just pass aa and ab. Still, this was just a suggestion to avoid a few lines of code, so let's just leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got what you mean. I'll leave it as is for now. Thanks.
|
Test build #67582 has finished for PR 15621 at commit
|
|
I'll merge it into master. Thanks for reviewing! @sethah |
## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes apache#15621 from yanboliang/spark-17748.
## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes apache#15621 from yanboliang/spark-17748.
What changes were proposed in this pull request?
This is follow-up work of #15394.
Reorg some variables of
WeightedLeastSquaresand fix one minor issue ofWeightedLeastSquaresSuite.How was this patch tested?
Existing tests.