-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18080][ML][PySpark] Locality Sensitive Hashing (LSH) Python API. #15768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #68135 has finished for PR 15768 at commit
|
|
Test build #68136 has finished for PR 15768 at commit
|
|
I began to review this, but got sidetracked with a lot of the details we are currently discussing on the original LSH PR. |
|
This can now proceed since http://github.com/apache/spark/pull/15874 is ready to be merged. Sorry for the delay! This will need to slip to 2.2 |
|
Pinging on this: What's a reasonable ETA for updating the PR? Thanks @yanboliang ! |
|
@jkbradley I just came back from vacation, and will update this PR before the weekend. Thanks. |
|
Test build #3550 has finished for PR 15768 at commit
|
|
Btw, @yanboliang and @Yunni did you sync? I'm fine with the takeover, but don't want to stomp on toes. Both can be listed as authors when this gets merged. Should we close this issue with the other taking its place? |
|
I'm OK to close this one and glad to help to review #16715 , but I think I will have time until next week. |
…e Hashing ## What changes were proposed in this pull request? This pull request includes python API and examples for LSH. The API changes was based on yanboliang 's PR apache#15768 and resolved conflicts and API changes on the Scala API. The examples are consistent with Scala examples of MinHashLSH and BucketedRandomProjectionLSH. ## How was this patch tested? API and examples are tested using spark-submit: `bin/spark-submit examples/src/main/python/ml/min_hash_lsh.py` `bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh.py` User guide changes are generated and manually inspected: `SKIP_API=1 jekyll build` Author: Yun Ni <[email protected]> Author: Yanbo Liang <[email protected]> Author: Yunni <[email protected]> Closes apache#16715 from Yunni/spark-18080.
…e Hashing ## What changes were proposed in this pull request? This pull request includes python API and examples for LSH. The API changes was based on yanboliang 's PR apache#15768 and resolved conflicts and API changes on the Scala API. The examples are consistent with Scala examples of MinHashLSH and BucketedRandomProjectionLSH. ## How was this patch tested? API and examples are tested using spark-submit: `bin/spark-submit examples/src/main/python/ml/min_hash_lsh.py` `bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh.py` User guide changes are generated and manually inspected: `SKIP_API=1 jekyll build` Author: Yun Ni <[email protected]> Author: Yanbo Liang <[email protected]> Author: Yunni <[email protected]> Closes apache#16715 from Yunni/spark-18080.
What changes were proposed in this pull request?
Add
MinHashandRandomProjectionPython API.How was this patch tested?
Add doc tests.