Early stopping support classifier and regressor fit #54

mmccarty · 2019-09-24T20:47:47Z

This change adds support for early stopping to the sklearn interface by passing arguments in the proper format.

Classifier
Regressor

fixes #38

dask_xgboost/core.py

mmccarty · 2019-09-24T23:59:52Z

It would be better to refactor certain parts of sklearn API code in xgboost to be more reusable here rather than duplicating the code. Once this settles out, I would like to clean that up.

mmccarty · 2019-09-26T18:01:03Z

@TomAugspurger Thanks for taking a look. If folks are good with this approach, I will proceed.

dask_xgboost/core.py

TomAugspurger · 2019-10-01T14:54:21Z

So to summarize the discussion from https://github.com/dask/dask-xgboost/pull/54/files/d10bb355a1d82a89233c58295c2b515d3e39baed#diff-66bc4b86e5e634f64f45b16394051674, we decided that eval_set and sample_weight_eval_set should be in-memory ndarrays, which are sent to each worker? That matches what we do with Hyperband in dask-ml.

Eventually we'll want to support sample_weight, which IIUC will be a dask array the same length as X and y.

TomAugspurger · 2019-10-01T22:22:59Z

Overall, things are looking good here @mmccarty. I think just linting issues now.

mmccarty · 2019-10-01T22:25:51Z

Great! Thank you for checking. I’ll get the linting fixed ASAP.

…

On Tue, Oct 1, 2019 at 6:23 PM Tom Augspurger ***@***.***> wrote: Overall, things are looking good here @mmccarty <https://github.com/mmccarty>. I think just linting issues now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54?email_source=notifications&email_token=AAEY2GUZW347YUTSLIS7OITQMPEUJA5CNFSM4I2EVIZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAC6L3I#issuecomment-537257453>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEY2GQZI4QMDMT4SRZYFF3QMPEUJANCNFSM4I2EVIZA> .

mmccarty · 2019-10-02T14:04:37Z

@TomAugspurger CI is passing. I also want to add this to the Regressor. I'm working on that now.

dask_xgboost/core.py

mmccarty · 2019-10-02T16:39:09Z

@TomAugspurger @mrocklin All done here, unless there are further comments.

TomAugspurger · 2019-10-03T14:32:02Z

dask_xgboost/core.py

+                weight=weight,
+                nthread=n_jobs,
+            )
+            for ((data, label), weight) in zip(


Can you check: what happens if the user does eval_set=[(x1, y2), (x2, y2)], sample_weight_eval_set=[weight1]? Do we sielently drop the (x2, y2) eval set here, since the lengths don't match?

Would the equivalent mistake raise in xgboost, or do they silently proceed?

I'll check.

@TomAugspurger I copied this code from xgboost here and then cleaned it up a bit. The original code will silently proceed. This code will error. I'm going to align it with the original code.

Actually, no it doesn't error. They behave the same. I'll push up another unit test to confirm.

Heh, OK.

FWIW, I think raising is the right thing to do... Maybe open an issue with XGBoost to see if they agree? Then you won't need to change anything here.

That's what I was wondering. What's the right thing to do? Three options, maybe?

Silently drop

Silently fill right

Raise error

@TomAugspurger I opened an issue. Good catch!

TomAugspurger · 2019-10-03T17:56:34Z

Alrightly, let's merge this. We can followup if xgboost decides to change their behavior.

Thanks @mmccarty!

TomAugspurger · 2019-10-03T17:57:18Z

Is there a time where a release would be good for you? I can do one sometime next week if that works.

mmccarty · 2019-10-03T18:28:58Z

Great! Thanks! Yeah, sometime next week works!

initial pass at early stopping support in classifier fit.

d10bb35

TomAugspurger reviewed Sep 24, 2019

View reviewed changes

dask_xgboost/core.py Show resolved Hide resolved

dask_xgboost/core.py Outdated Show resolved Hide resolved

dask_xgboost/core.py Outdated Show resolved Hide resolved

mmccarty commented Sep 24, 2019

View reviewed changes

dask_xgboost/core.py Outdated Show resolved Hide resolved

mrocklin reviewed Sep 26, 2019

View reviewed changes

dask_xgboost/core.py Outdated Show resolved Hide resolved

assuming numpy arrays for eval_sets

3b47f16

linting

85f68b7

mmccarty changed the title ~~[WIP] Early stopping support classifier and regressor fit~~ Early stopping support classifier and regressor fit Oct 2, 2019

linting

c5e9412

hard link to xgboost doc/paramater.rst

e2433d9

TomAugspurger reviewed Oct 2, 2019

View reviewed changes

dask_xgboost/core.py Outdated Show resolved Hide resolved

Added ES support to XGBRegressor

df3cf12

mmccarty force-pushed the fit-early-stopping branch from b48d394 to df3cf12 Compare October 2, 2019 16:29

TomAugspurger reviewed Oct 3, 2019

View reviewed changes

added test for _package_evals to verify alignment with xgboost behavior

0c4a9b5

mmccarty mentioned this pull request Oct 3, 2019

sklearn fit silently drops or errors when length of eval_set and sample_weights do not match dmlc/xgboost#4913

Closed

TomAugspurger merged commit 2020ab3 into dask:master Oct 3, 2019

mmccarty deleted the fit-early-stopping branch October 3, 2019 18:25

jacobtomlinson mentioned this pull request Nov 3, 2020

Migrate CI to GitHub Actions as Travis CI is ending their free OSS builds dask/community#107

Open

Uh oh!

Early stopping support classifier and regressor fit #54

Early stopping support classifier and regressor fit #54

Uh oh!

Conversation

mmccarty commented Sep 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmccarty commented Sep 24, 2019

Uh oh!

mmccarty commented Sep 26, 2019

Uh oh!

Uh oh!

TomAugspurger commented Oct 1, 2019

Uh oh!

TomAugspurger commented Oct 1, 2019

Uh oh!

mmccarty commented Oct 1, 2019 via email

Uh oh!

mmccarty commented Oct 2, 2019

Uh oh!

Uh oh!

mmccarty commented Oct 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

mmccarty Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

mmccarty Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

mmccarty Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

mmccarty Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

mmccarty Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Oct 3, 2019

Uh oh!

TomAugspurger commented Oct 3, 2019

Uh oh!

mmccarty commented Oct 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mmccarty commented Sep 24, 2019 •

edited

Loading

mmccarty commented Oct 2, 2019 •

edited

Loading