-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11084][ML][PYTHON] Check if index can contain non-zero value before binary search #9098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -252,6 +252,16 @@ def test_sparse_vector_indexing(self): | |
| for ind in [7.8, '1']: | ||
| self.assertRaises(TypeError, sv.__getitem__, ind) | ||
|
|
||
| zeros = SparseVector(4, {}) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this already covered in the previous part of this test with "sv"?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, It should be. I played with different implementations and I just wanted to be sure I didn't break anything. I can remove this if you want.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's OK for now; it's a quick test |
||
| self.assertEqual(zeros[0], 0.0) | ||
| self.assertEqual(zeros[3], 0.0) | ||
| for ind in [4, -5]: | ||
| self.assertRaises(ValueError, zeros.__getitem__, ind) | ||
|
|
||
| empty = SparseVector(0, {}) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like a good test to have. |
||
| for ind in [-1, 0, 1]: | ||
| self.assertRaises(ValueError, empty.__getitem__, ind) | ||
|
|
||
| def test_matrix_indexing(self): | ||
| mat = DenseMatrix(3, 2, [0, 1, 4, 6, 8, 10]) | ||
| expected = [[0, 6], [1, 8], [4, 10]] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: Is
inds.item(-1)faster thaninds[-1]? I wasn't sure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By itself it is slower than using
__getitem__but it returns standard Python scalar and overall it should be marginally faster when index is a standard scalar as well.If you think that
inds[-1]is better it shouldn't really matter. With crude tests I get ~40 ns difference on average (245 ns for[]and, 205 ns forinds)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'd keep what you have.