-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22883] ML test for StructuredStreaming: spark.ml.feature, I-M #20964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the "value" column use java.lang.Double type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's nullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector1 === vector2
and the error message looks strange. ==> s"scaled value $vector2 do not equal expected value "$vector1" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have "expected" column" here to compare with ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's necessary for testing that this works with structured streaming. (I can't see how streaming would mess up the correctness of the algorithm.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector1 === vector2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, === is more standard (though both are fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why change Seq[String] to Seq[_], is it for fixing some potential issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String is not actually checked because of erasure, so IntelliJ complained with a style warning before this change.
|
Test build #88851 has finished for PR 20964 at commit
|
|
LGTM. 👍 |
|
Thanks! I'll rerun tests since they are stale and merge after they pass. |
|
Test build #4151 has finished for PR 20964 at commit
|
6dff94e to
592670c
Compare
|
I rebased off of master because of the merge warning in the last tests. |
|
Test build #89174 has finished for PR 20964 at commit
|
|
Merging with master |
## What changes were proposed in this pull request? Adds structured streaming tests using testTransformer for these suites: * IDF * Imputer * Interaction * MaxAbsScaler * MinHashLSH * MinMaxScaler * NGram ## How was this patch tested? It is a bunch of tests! Author: Joseph K. Bradley <[email protected]> Closes apache#20964 from jkbradley/SPARK-22883-part2.
This backports #20964 to branch-2.3. ## What changes were proposed in this pull request? Adds structured streaming tests using testTransformer for these suites: * IDF * Imputer * Interaction * MaxAbsScaler * MinHashLSH * MinMaxScaler * NGram ## How was this patch tested? It is a bunch of tests! Author: Joseph K. Bradley <josephdatabricks.com> Author: Joseph K. Bradley <[email protected]> Closes #21042 from jkbradley/SPARK-22883-part2-2.3backport.
What changes were proposed in this pull request?
Adds structured streaming tests using testTransformer for these suites:
How was this patch tested?
It is a bunch of tests!