-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11207][ML] Add test cases for solver selection of LinearRegres… #9180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
11cd9c1
28427d2
f85bca6
22ba64e
f6b2256
2082d47
003d3bd
0a43033
59383fd
888b216
c27a4c3
97c76c9
74de81e
241ec72
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -77,13 +77,11 @@ object LinearDataGenerator { | |
| nPoints: Int, | ||
| seed: Int, | ||
| eps: Double = 0.1): Seq[LabeledPoint] = { | ||
| generateLinearInput(intercept, weights, | ||
| Array.fill[Double](weights.length)(0.0), | ||
| Array.fill[Double](weights.length)(1.0 / 3.0), | ||
| nPoints, seed, eps)} | ||
| generateLinearInput(intercept, weights, Array.fill[Double](weights.length)(0.0), | ||
| Array.fill[Double](weights.length)(1.0 / 3.0), nPoints, seed, eps) | ||
| } | ||
|
|
||
| /** | ||
| * | ||
| * @param intercept Data intercept | ||
| * @param weights Weights to be applied. | ||
| * @param xMean the mean of the generated features. Lots of time, if the features are not properly | ||
|
|
@@ -104,24 +102,66 @@ object LinearDataGenerator { | |
| nPoints: Int, | ||
| seed: Int, | ||
| eps: Double): Seq[LabeledPoint] = { | ||
| generateLinearInput(intercept, weights, xMean, xVariance, nPoints, seed, eps, 0.0) | ||
| } | ||
|
|
||
|
|
||
| /** | ||
| * @param intercept Data intercept | ||
| * @param weights Weights to be applied. | ||
| * @param xMean the mean of the generated features. Lots of time, if the features are not properly | ||
| * standardized, the algorithm with poor implementation will have difficulty | ||
| * to converge. | ||
| * @param xVariance the variance of the generated features. | ||
| * @param nPoints Number of points in sample. | ||
| * @param seed Random seed | ||
| * @param eps Epsilon scaling factor. | ||
| * @param sparsity The ratio of zero elements. If it is 0.0, LabeledPoints with | ||
| * DenseVector is returned. | ||
| * @return Seq of input. | ||
| */ | ||
| @Since("1.6.0") | ||
| def generateLinearInput( | ||
| intercept: Double, | ||
| weights: Array[Double], | ||
| xMean: Array[Double], | ||
| xVariance: Array[Double], | ||
| nPoints: Int, | ||
| seed: Int, | ||
| eps: Double, | ||
| sparsity: Double): Seq[LabeledPoint] = { | ||
| require(0.0 <= sparsity && sparsity <= 1.0) | ||
| val rnd = new Random(seed) | ||
| val x = Array.fill[Array[Double]](nPoints)( | ||
| Array.fill[Double](weights.length)(rnd.nextDouble())) | ||
|
|
||
| val sparseRnd = new Random(seed) | ||
| x.foreach { v => | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once you have
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can also add the variance of sparsity such that the num of non zeros will not be constant. |
||
| var i = 0 | ||
| val len = v.length | ||
| while (i < len) { | ||
| v(i) = (v(i) - 0.5) * math.sqrt(12.0 * xVariance(i)) + xMean(i) | ||
| if (sparseRnd.nextDouble() < sparsity) { | ||
| v(i) = 0.0 | ||
| } else { | ||
| v(i) = (v(i) - 0.5) * math.sqrt(12.0 * xVariance(i)) + xMean(i) | ||
| } | ||
| i += 1 | ||
| } | ||
| } | ||
|
|
||
| val y = x.map { xi => | ||
| blas.ddot(weights.length, xi, 1, weights, 1) + intercept + eps * rnd.nextGaussian() | ||
| } | ||
| y.zip(x).map(p => LabeledPoint(p._1, Vectors.dense(p._2))) | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To simplify the following code, do y.zip(x).map { p =>
if (sparsity == 0.0) {
LabeledPoint(p._1, Vectors.dense(p._2))
} else {
LabeledPoint(p._1, Vectors.dense(p._2).toSparse)
}
} |
||
| y.zip(x).map { p => | ||
| if (sparsity == 0.0) { | ||
| // Return LabeledPoints with DenseVector | ||
| LabeledPoint(p._1, Vectors.dense(p._2)) | ||
| } else { | ||
| // Return LabeledPoints with SparseVector | ||
| LabeledPoint(p._1, Vectors.dense(p._2).toSparse) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about consolidate with
LinearDataGenerator, and addsparsity = 1.0as param to control if it's sparse feature?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also thought it is good idea. But
LinearDataGeneratoris used as static object, then we have to passsparsityas parameter togenerateLinearInput. This method seems to be used a lot of suites. It is necessary to change a lot of method reference.Therefore it might be better to do in separate JIRA. What do you thing about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's modify the JIRA and do it here. Basically, you can create a
LinearDataGeneratorwith old signature calling new API for compatibility issue.