Skip to content

Conversation

yurymalkov
Copy link
Member

Overflow caused by using uint32 in pointer operation. Large indexes are not saved properly.
commit changes uint32->uint64 and now the test is passing.

Thanks to Kai Wohlfahrt (https://www.linkedin.com/in/kwohlfahrt/ ) for reporting

Script to reproduce

import pickle
import numpy as np
import hnswlib

n = 5_000_000
d = 400


data=np.float32(np.random.random((n, 4)))
data=np.pad(data, ((0,0), (0, d-4)))
print(data.shape)

ann = hnswlib.Index("l2", d)
ann.init_index(n, ef_construction=10, M=16)

ann.add_items(data)
total_elements_to_test=n
print("testing recall")
labels, _ = ann.knn_query(data[:total_elements_to_test], k = 1)
print("recall:",np.sum(labels.reshape(-1)==np.arange(0,total_elements_to_test))/total_elements_to_test)


pickled=pickle.dumps(ann)
ann=pickle.loads(pickled)


print("testing recall after pickle")
labels, _ = ann.knn_query(data[:total_elements_to_test], k = 1)
print("recall:",np.sum(labels.reshape(-1)==np.arange(0,total_elements_to_test))/total_elements_to_test)
print(f"size whouls be at least{data.size * data.itemsize:,} bytes")
print(f"pickle size:{len(pickled):,} bytes")```

@yurymalkov yurymalkov merged commit bcf0dc6 into develop Feb 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant