Skip to content

Conversation

@feynmanliang
Copy link

TODOs:

  • Switch to 1 vs all when featureArity is too large (heuristic here?)
  • Efficiently propose/iterate over proposed subset splits; we currently add and remove all ImpurityAggregatorSingles for each category from the left and right ImpurityStatss when we should just be making a single add/remove from left/right (ditto for leftCount and rightCount)

@feynmanliang feynmanliang force-pushed the dt-unordered-categorical branch from 286573c to 2829936 Compare November 14, 2015 22:44
@feynmanliang
Copy link
Author

@jkbradley Rewrote this to consider all subsets for splitting unordered categorical, do you mind reviewing? Also, see TODOs in PR heading; can those wait for another PR or do you want them in this one?

@jkbradley
Copy link
Owner

Thanks! I'll take a look.

Switch to 1 vs all when featureArity is too large (heuristic here?)

Definitely should not do it here. I'm not sure if this will be better than the currently heuristic of imposing an ordering.

Efficiently propose/iterate over proposed subset splits; we currently add and remove all ImpurityAggregatorSingles for each category from the left and right ImpurityStatss when we should just be making a single add/remove from left/right (ditto for leftCount and rightCount)

I agree we should wait on this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "possibly"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jkbradley
Copy link
Owner

That's all for now. My comments are just cleanups, but I think it can be simplified a bit. Thanks!

@feynmanliang
Copy link
Author

@jkbradley Updated

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modifies fullImpurityAgg. It should create a copy first (or share 1 copy for the whole loop and overwrite on each iteration).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jkbradley
Copy link
Owner

@feynmanliang Thank you for the updates! Just a couple new comments.

@feynmanliang
Copy link
Author

@jkbradley Updated

@jkbradley
Copy link
Owner

Thanks for updating! This LGTM. I'll merge it.

jkbradley added a commit that referenced this pull request Dec 1, 2015
Implements chooseUnorderedCategoricalSplit
@jkbradley jkbradley merged commit 1ed14f5 into jkbradley:dt-features Dec 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants