-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-29121][ML][MLLIB] Support for dot product operation on Vector(s) #25818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * If `size` does not match an [[IllegalArgumentException]] is thrown. | ||
| */ | ||
| @Since("3.0.0") | ||
| def dot(v1: Vector, v2: Vector): Double = BLAS.dot(v1, v2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, do we need this method? BLAS.dot() already exists. I can see an instance method taking a single arg for parity with Pyspark, but this doesn't add much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is private to spark, hence the simple wrapping: https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala#L26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, fair point. Still you can just call a.dot(b) after the first method is added, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed I can. Lemme fix. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I meant, I don't think there is value in adding this method, because a caller can use a.dot(b) directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I got ya. Yep, I can remove those.
|
@srowen I'll take a look at what is supported in PySpark and see if there are any more gaps. I would enjoy working on this... |
I was just putting this out there as volunteering for future work if it is of interest to the community. I don't have anything else to add to this PR unless there is further review. |
|
Test build #4880 has finished for PR 25818 at commit
|
|
Merged to master |
What changes were proposed in this pull request?
Support for dot product with:
ml.linalg.Vectorml.linalg.Vectorsmllib.linalg.Vectormllib.linalg.VectorsWhy are the changes needed?
Dot product is useful for feature engineering and scoring. BLAS routines are already there, just a wrapper is needed.
Does this PR introduce any user-facing change?
No user facing changes, just some new functionality.
How was this patch tested?
Tests were written and added to the appropriate
VectorSuitesclasses. They can be quickly run with: