Hello,
I'm working on a small proof of concept. I use dask in my project and would like to use the XGBClassifier. I also need a parameter search and, of course, cross-validation mechanisms.
Unfortunately, when fitting the dask_xgboost.XGBClassifier, I get the following error:
Anaconda_3.5.1\envs\prescoring\lib\site-packages\dask_xgboost\core.py", line 97, in _train
AttributeError: 'DataFrame' object has no attribute 'to_delayed'
Although I call .fit() with two dask objects, somehow it becomes a pandas.DataFrame later on.
Here's the code I'm using:
import dask.dataframe as dd
import numpy as np
import pandas as pd
from dask_ml.model_selection import GridSearchCV
from dask_xgboost import XGBClassifier
from distributed import Client
from sklearn.datasets import load_iris
if __name__ == '__main__':
client = Client()
data = load_iris()
x = pd.DataFrame(data=data['data'], columns=data['feature_names'])
x = dd.from_pandas(x, npartitions=2)
y = pd.Series(data['target'])
y = dd.from_pandas(y, npartitions=2)
estimator = XGBClassifier(objective='multi:softmax', num_class=4)
grid_search = GridSearchCV(
estimator,
param_grid={
'n_estimators': np.arange(15, 105, 15)
},
scheduler='threads'
)
grid_search.fit(x, y)
results = pd.DataFrame(grid_search.cv_results_)
print(results.to_string())
I use the packages in the following versions:
pandas==0.23.3
numpy==1.15.1
dask==0.20.0
dask-ml==0.11.0
dask-xgboost==0.1.5
Note that I don't get this exception when using sklearn.ensemble.GradientBoostingClassifier.
Any help would be appreciated.
Mateusz