Skip to content

Conversation

@phpisciuneri
Copy link
Contributor

What changes were proposed in this pull request?

Support for dot product with:

  • ml.linalg.Vector
  • ml.linalg.Vectors
  • mllib.linalg.Vector
  • mllib.linalg.Vectors

Why are the changes needed?

Dot product is useful for feature engineering and scoring. BLAS routines are already there, just a wrapper is needed.

Does this PR introduce any user-facing change?

No user facing changes, just some new functionality.

How was this patch tested?

Tests were written and added to the appropriate VectorSuites classes. They can be quickly run with:

sbt "mllib-local/testOnly org.apache.spark.ml.linalg.VectorsSuite"
sbt "mllib/testOnly org.apache.spark.mllib.linalg.VectorsSuite"

@HyukjinKwon HyukjinKwon changed the title [SPARK-29121][ML, MLLIB] Support for dot product operation on Vector(s) [SPARK-29121][ML][MLLIB] Support for dot product operation on Vector(s) Sep 18, 2019
* If `size` does not match an [[IllegalArgumentException]] is thrown.
*/
@Since("3.0.0")
def dot(v1: Vector, v2: Vector): Double = BLAS.dot(v1, v2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, do we need this method? BLAS.dot() already exists. I can see an instance method taking a single arg for parity with Pyspark, but this doesn't add much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, fair point. Still you can just call a.dot(b) after the first method is added, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed I can. Lemme fix. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I meant, I don't think there is value in adding this method, because a caller can use a.dot(b) directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I got ya. Yep, I can remove those.

@phpisciuneri
Copy link
Contributor Author

@srowen I'll take a look at what is supported in PySpark and see if there are any more gaps. I would enjoy working on this...

@phpisciuneri
Copy link
Contributor Author

I'll take a look at what is supported in PySpark and see if there are any more gaps. I would enjoy working on this...

I was just putting this out there as volunteering for future work if it is of interest to the community. I don't have anything else to add to this PR unless there is further review.

@SparkQA
Copy link

SparkQA commented Sep 21, 2019

Test build #4880 has finished for PR 25818 at commit 9d2959b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in c7c6b64 Sep 21, 2019
@srowen
Copy link
Member

srowen commented Sep 21, 2019

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants