Skip to content

[FEA] get dask_cudf.Series from xgb.dask.predict() #5425

@rnyak

Description

@rnyak

Is your feature request related to a problem? Please describe.

I would like to obtain prediction result from prediction = xgb.dask.predict(client, output, dtrain) as dask_cudf.Series instead of dask.array.core.Array. I'd like to keep prediction results on GPU, as I do prediction on large number of records.

I am using, rapids 0.13 nightly in a conda env, and Dask 2.12.0. Here is a min rep example code:

import xgboost as xgb
import cudf, dask_cudf

cdf = cudf.DataFrame()
cdf['day'] = [15, 10, 20, 20,  21, 25, 28, 29]
cdf['hour'] = [19, 20, 20, 21, 18, 12, 15, 13]
cdf['passenger_count'] = [1, 1, 2, 2, 3, 3, 4, 2]
cdf['fare_amount'] = [5.0, 3.5, 12.5, 4.5, 9.0, 5.0, 3.5, 7.5]
ddf=dask_cudf.from_cudf(cdf, npartitions=2)

#prepare dtrain and call xgboost 
....
dtrain = xgb.dask.DaskDMatrix(client, X_train, Y_train)
trained_model = xgb.dask.train(client, params, dtrain,
                        num_boost_round=100, evals=[(dtrain, 'train')])
#prepare dtest
....
dtest = xgb.dask.DaskDMatrix(client, X_test, Y_test)

#do prediction
prediction = xgb.dask.predict(client, trained_model['booster'], dtest)

type(prediction)
dask.array.core.Array

Describe the solution you'd like
Keep prediction results on GPU when doing prediction on large number of records.

prediction = xgb.dask.predict(client, output, dtrain)
type(prediction)
dask_cudf.core.Series

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions