-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16045][ML][Doc] Spark 2.0 ML.feature: doc update for stopwords and binarizer #13375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #59557 has finished for PR 13375 at commit
|
docs/ml-features.md
Outdated
| `StopWordsRemover` instance. A boolean parameter `caseSensitive` indicates | ||
| if the matches should be case sensitive (false by default). | ||
| the `stopWords` parameter. Default stop words for some languages are provided | ||
| ("danish", "dutch", "english", "finnish", "french", "german", "hungarian", "italian", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Languages should be like Danish, Dutch, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid it will confuse users as currently it does not support StopWordsRemover.loadDefaultStopWords("English") (with Capital E). Maybe we should use language.toLower in loadDefaultStopWords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in that case just clarify that these strings are arguments to the method.
|
Thanks for the review @srowen |
|
Test build #59592 has finished for PR 13375 at commit
|
|
I think that's fine. |
|
@hhbyyh Could you please create a new JIRA for this PR to make it clear in JIRA what it is addressing? Please link it to SPARK-15100. Thanks! |
|
created sub-task: https://issues.apache.org/jira/browse/SPARK-16045 and updated the title. |
|
LGTM. Merged into master and branch-2.0. Thanks! |
… and binarizer ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-16045 2.0 Audit: Update document for StopWordsRemover and Binarizer. ## How was this patch tested? manual review for doc Author: Yuhao Yang <[email protected]> Author: Yuhao Yang <[email protected]> Closes #13375 from hhbyyh/stopdoc. (cherry picked from commit a58f402) Signed-off-by: Xiangrui Meng <[email protected]>
### What changes were proposed in this pull request? This pr aims to upgrade netty from 4.1.92 to 4.1.93. ### Why are the changes needed? 1.v4.1.92 VS v4.1.93 netty/netty@netty-4.1.92.Final...netty-4.1.93.Final 2.The new version brings some bug fix, eg: - Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320)) - OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365)) - Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366)) - HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368)) - Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375)) - H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388)) - Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389)) - HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394)) - Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399)) 3.The release notes as follows: - https://netty.io/news/2023/05/25/4-1-93-Final.html 4.Why not upgrade to `4-1-94-Final` version? Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version. The reference is as follows: https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164 https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Closes #41681 from panbingkun/upgrade_netty. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-16045
2.0 Audit: Update document for StopWordsRemover and Binarizer.
How was this patch tested?
manual review for doc