-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-6528][ML] Add IDF transformer #5266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #29400 has started for PR 5266 at commit |
|
To test #5269 I'm going to rerun these Jenkins tests as this is a prime example of that bug. |
|
jenkins, retest this please |
|
Test build #29421 has finished for PR 5266 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val idf = udf { (v: Vector) => idfModel.transform(v) }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mengxr Here is my concern: Even if we put the transformSchema in the base class, there are too many boilerplates. Same code and same problem in StandardScaler. I think the Vector to Vector transform will be very common, how about pulling them in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary in this PR. We should make utility functions to make the type checks easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it.
|
Test build #29581 has finished for PR 5266 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test looks really complicated to me and hard to read. Could we pre-compute the expected output and validate the results directly?
|
Test build #29931 has finished for PR 5266 at commit
|
|
Test build #29930 has finished for PR 5266 at commit
|
|
Test build #30730 has finished for PR 5266 at commit
|
|
Test build #30742 has finished for PR 5266 at commit
|
|
Test build #30745 has finished for PR 5266 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mengxr Should I use map(minDocFreq) instead of getMinDocFreq? I think getMinDocFreq cannot fetch new param from paramMap.
|
Test build #30927 has finished for PR 5266 at commit
|
|
@mengxr ready to review |
|
LGTM. Merged into master. Thanks! |
See [SPARK-6528](https://issues.apache.org/jira/browse/SPARK-6528). Add IDF transformer in ML package. Author: Xusen Yin <[email protected]> Closes apache#5266 from yinxusen/SPARK-6528 and squashes the following commits: 741db31 [Xusen Yin] get param from new paramMap d169967 [Xusen Yin] add final to param and IDF class c9c3759 [Xusen Yin] simplify test suite 5867c09 [Xusen Yin] refine IDF transformer with new interfaces 7727cae [Xusen Yin] Merge branch 'master' into SPARK-6528 4338a37 [Xusen Yin] Merge branch 'master' into SPARK-6528 aef2cdf [Xusen Yin] add doc and group for param 5760b49 [Xusen Yin] fix code style 2add691 [Xusen Yin] fix code style and test 03fbecb [Xusen Yin] remove duplicated code 2aa4be0 [Xusen Yin] clean test suite 4802c67 [Xusen Yin] add IDF transformer and test suite
See [SPARK-6528](https://issues.apache.org/jira/browse/SPARK-6528). Add IDF transformer in ML package. Author: Xusen Yin <[email protected]> Closes apache#5266 from yinxusen/SPARK-6528 and squashes the following commits: 741db31 [Xusen Yin] get param from new paramMap d169967 [Xusen Yin] add final to param and IDF class c9c3759 [Xusen Yin] simplify test suite 5867c09 [Xusen Yin] refine IDF transformer with new interfaces 7727cae [Xusen Yin] Merge branch 'master' into SPARK-6528 4338a37 [Xusen Yin] Merge branch 'master' into SPARK-6528 aef2cdf [Xusen Yin] add doc and group for param 5760b49 [Xusen Yin] fix code style 2add691 [Xusen Yin] fix code style and test 03fbecb [Xusen Yin] remove duplicated code 2aa4be0 [Xusen Yin] clean test suite 4802c67 [Xusen Yin] add IDF transformer and test suite
See SPARK-6528. Add IDF transformer in ML package.