Skip to content

Conversation

@hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Jul 29, 2022

By using bitsets instead of lists of longs item sets can be faster de-duplicated. A bit is set according to the order of top items (by count).

Screenshot_20220729_153822

Notes:

  • the bitset might be useful for transactions and can speedup the lookup to find out if a candidate set matches the transaction
  • bitsets reduce memory requirements(memory for remembering collected sets)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jul 29, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @hendrikmuhs, I've created a changelog YAML for you.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those numbers look great! I like some of the refactors as well. Just some minor comments.

@hendrikmuhs hendrikmuhs merged commit e64eb8c into elastic:main Aug 1, 2022
@hendrikmuhs hendrikmuhs deleted the frequent-items-bitset3 branch August 1, 2022 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants