Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Oct 25, 2016

What changes were proposed in this pull request?

This is follow-up work of #15394.
Reorg some variables of WeightedLeastSquares and fix one minor issue of WeightedLeastSquaresSuite.

How was this patch tested?

Existing tests.

@yanboliang
Copy link
Contributor Author

@sethah I wish you would not mind, I made some reorg for variables of WeightedLeastSquares. Since we use bBar, bbBar, aBar, abBar, aaBar in standardized space always, I put the standardization code and variable declaration together. Then if developers would like to know the detail of each variable, they only need to refer one place which I think is more clear to be understand. Meanwhile, I improved some naming and fix the minor issue in WeightedLeastSquaresSuite. I wish you like this change and open to hear your thoughts. If you think it's not necessary, I can also revert that part. Thanks.

}

/** Construct A^T^ A from summary statistics. */
/** Construct A^T^ A (append bias if necessary). */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary statistics are in original space, here we construct A^T^ A from standardized space.

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #67506 has finished for PR 15621 at commit 99ced09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// scale aBar to standardized space in-place
while (i < numFeatures) {
if (aStd(i) == 0.0) {
_aBar.values(i) = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General suggestion, the following two are not equivalent:

while (j < numFeatures) {
  _abar.values(i) /= aStd(i)
}
val _abarValues = _abar.values
while (j < numFeatures) {
  _abarValues(i) /= aStd(i)
}

This is because of how scala compiles class member vals, you are actually accessing a getter method when you call _abar.values. For speed, I think we should revert this back to making explicit pointers to the values. In this case, this probably isn't a big deal and likely wouldn't matter for anything but aaBar since it's k by k (roughly) in dimension. Still, I don't think it's a bad idea. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I total agree. I thought the impact of this issue is trivial, but making implicit pointer definitely should be better. I made update for all variables. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. The only thing it might have a real impact is aaBar, but since we use while loops for speed, it seems better to also use this optimization as well.

}
j += 1
}
val bBar = summary.bBar / bStd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the names bBarStd and bbBarStd here, as I think they're more descriptive. But it is not a strong preference so I'll leave it up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use bBar, bbBar, aBar, abBar, aaBar in standardized space always, so I did not append Std as suffix for all variables. If we only add suffix for bBar and bbBar, developers may misinterpret that other variables are not in standardized space.

aaBarValues(p) = 0.0
// scale abBar to standardized space in-place
while (i < numFeatures) {
if (aStd(i) == 0.0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar thing here, you're calling aStd.apply(i) which in turn calls an internal getter for aStd.values which then indexes the array.

Copy link
Contributor

@sethah sethah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing, otherwise LGTM.

j += 1
_aBar
}
val aBarValues = aBar.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just make the above be val aBarValues = { ... } returning _aBarValues from the code block instead of _abar. Since we make a pointer to aBar but then only use it to get its values.

Copy link
Contributor Author

@yanboliang yanboliang Oct 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass aBar to NormalEquationSolver.solve(), so we can't remove the declaration of aBar. Or other we can convert the argument type of NormalEquationSolver.solve() from DenseVector to Array, but we need to wrap Array to DenseVector and pass into BLAS operation inside of solve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good point. Still, for aaBar and abBar I think we can do it. We can leave aBar as it is.

Copy link
Contributor Author

@yanboliang yanboliang Oct 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep their types consistent at interface level, and the following two scenarios:

def solve(
      bBar: Double,
      bbBar: Double,
      abBar: DenseVector,
      aaBar: DenseVector,
      aBarValues: Array[Double]): NormalEquationSolution

and

def solve(
      bBar: Double,
      bbBar: Double,
      abBarValues: Array[Double],
      aaBarValues: Array[Double],
      aBar: DenseVector): NormalEquationSolution

are not good. So I left them as it is. Is this OK for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to change the solve signature to intermix arrays and vectors since we just pass aa and ab. Still, this was just a suggestion to avoid a few lines of code, so let's just leave it as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got what you mean. I'll leave it as is for now. Thanks.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67582 has finished for PR 15621 at commit 4b76649.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

I'll merge it into master. Thanks for reviewing! @sethah

@asfgit asfgit closed this in 312ea3f Oct 26, 2016
@yanboliang yanboliang deleted the spark-17748 branch October 26, 2016 16:31
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?
This is follow-up work of apache#15394.
Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#15621 from yanboliang/spark-17748.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
This is follow-up work of apache#15394.
Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#15621 from yanboliang/spark-17748.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants