Skip to content
This repository was archived by the owner on Jul 16, 2021. It is now read-only.

Conversation

@mmccarty
Copy link
Member

@mmccarty mmccarty commented Sep 24, 2019

This change adds support for early stopping to the sklearn interface by passing arguments in the proper format.

  • Classifier
  • Regressor

fixes #38

@mmccarty
Copy link
Member Author

It would be better to refactor certain parts of sklearn API code in xgboost to be more reusable here rather than duplicating the code. Once this settles out, I would like to clean that up.

@mmccarty
Copy link
Member Author

@TomAugspurger Thanks for taking a look. If folks are good with this approach, I will proceed.

@TomAugspurger
Copy link
Member

So to summarize the discussion from https://github.com/dask/dask-xgboost/pull/54/files/d10bb355a1d82a89233c58295c2b515d3e39baed#diff-66bc4b86e5e634f64f45b16394051674, we decided that eval_set and sample_weight_eval_set should be in-memory ndarrays, which are sent to each worker? That matches what we do with Hyperband in dask-ml.

Eventually we'll want to support sample_weight, which IIUC will be a dask array the same length as X and y.

@TomAugspurger
Copy link
Member

Overall, things are looking good here @mmccarty. I think just linting issues now.

@mmccarty
Copy link
Member Author

mmccarty commented Oct 1, 2019 via email

@mmccarty mmccarty changed the title [WIP] Early stopping support classifier and regressor fit Early stopping support classifier and regressor fit Oct 2, 2019
@mmccarty
Copy link
Member Author

mmccarty commented Oct 2, 2019

@TomAugspurger CI is passing. I also want to add this to the Regressor. I'm working on that now.

@mmccarty mmccarty force-pushed the fit-early-stopping branch from b48d394 to df3cf12 Compare October 2, 2019 16:29
@mmccarty
Copy link
Member Author

mmccarty commented Oct 2, 2019

@TomAugspurger @mrocklin All done here, unless there are further comments.

weight=weight,
nthread=n_jobs,
)
for ((data, label), weight) in zip(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check: what happens if the user does eval_set=[(x1, y2), (x2, y2)], sample_weight_eval_set=[weight1]? Do we sielently drop the (x2, y2) eval set here, since the lengths don't match?

Would the equivalent mistake raise in xgboost, or do they silently proceed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I copied this code from xgboost here and then cleaned it up a bit. The original code will silently proceed. This code will error. I'm going to align it with the original code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no it doesn't error. They behave the same. I'll push up another unit test to confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, OK.

FWIW, I think raising is the right thing to do... Maybe open an issue with XGBoost to see if they agree? Then you won't need to change anything here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was wondering. What's the right thing to do? Three options, maybe?

  1. Silently drop
  2. Silently fill right
  3. Raise error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I opened an issue. Good catch!

@TomAugspurger
Copy link
Member

Alrightly, let's merge this. We can followup if xgboost decides to change their behavior.

Thanks @mmccarty!

@TomAugspurger TomAugspurger merged commit 2020ab3 into dask:master Oct 3, 2019
@TomAugspurger
Copy link
Member

Is there a time where a release would be good for you? I can do one sometime next week if that works.

@mmccarty mmccarty deleted the fit-early-stopping branch October 3, 2019 18:25
@mmccarty
Copy link
Member Author

mmccarty commented Oct 3, 2019

Great! Thanks! Yeah, sometime next week works!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Added the support of evals for the train function for early stopping

3 participants