You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-5384][mllib] Vectors.sqdist returns inconsistent results for sparse/dense vectors when the vectors have different lengths
JIRA issue: https://issues.apache.org/jira/browse/SPARK-5384
Currently `Vectors.sqdist` return inconsistent result for sparse/dense vectors when the vectors have different lengths, please refer to JIRA for sample
PR scope:
Unify the sqdist logic for dense/sparse vectors and fix the inconsistency, also remove the possible sparse to dense conversion in the original code.
For reviewers:
Maybe we should first discuss what's the correct behavior.
1. Vectors for sqdist must have the same length, like in breeze?
2. If they can have different lengths, what's the correct result for sqdist? (should the extra part get into calculation?)
I'll update PR with more optimization and additional ut afterwards. Thanks.
Author: Yuhao Yang <[email protected]>
Closesapache#4183 from hhbyyh/fixDouble and squashes the following commits:
1f17328 [Yuhao Yang] limit PR scope to size constraints only
54cbf97 [Yuhao Yang] fix Vectors.sqdist inconsistence
0 commit comments