-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14050] [ML] Add multiple languages support and additional methods for Stop Words Remover #12843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After updating English stop words list, "d" is a stop word.
|
Test build #57535 has finished for PR 12843 at commit
|
|
Test build #57536 has finished for PR 12843 at commit
|
| */ | ||
| val caseSensitive: BooleanParam = new BooleanParam(this, "caseSensitive", | ||
| "whether to do case-sensitive comparison during filtering") | ||
| "whether to do a case-sensitive comparison over the stop stop words") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"stop stop" --> "stop"
|
Should there be a unit tests which iterates through StopWordsRemover.supportedLanguages and tests loading all & checking they are non-empty? Other than those small items, this looks good to me |
|
Test build #57769 has finished for PR 12843 at commit
|
|
LGTM pending tests |
|
Test build #2974 has finished for PR 12843 at commit
|
|
Test build #57923 has finished for PR 12843 at commit
|
|
Merged into master and branch-2.0. |
…ds for Stop Words Remover ## What changes were proposed in this pull request? This PR continues the work from #11871 with the following changes: * load English stopwords as default * covert stopwords to list in Python * update some tests and doc ## How was this patch tested? Unit tests. Closes #11871 cc: burakkose srowen Author: Burak Köse <[email protected]> Author: Xiangrui Meng <[email protected]> Author: Burak KOSE <[email protected]> Closes #12843 from mengxr/SPARK-14050. (cherry picked from commit e20cd9f) Signed-off-by: Xiangrui Meng <[email protected]>
What changes were proposed in this pull request?
This PR continues the work from #11871 with the following changes:
How was this patch tested?
Unit tests.
Closes #11871
cc: @burakkose @srowen