-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Closed
Description
Hello, I am an author of Dask, a library for parallel and distributed computing in Python. I am curious if there is interest within this community to collaborate on distributing XGBoost on Dask either for parallel training or for ETL.
There are probably two components of Dask that are relevant for this project:
- A generic system for parallel and distributed computing, built on arbitrary dynamic task scheduling. The relevant APIs here are probably dask.delayed and concurrent.futures
- A parallel and distributed subset of the Pandas API, dask.dataframe useful for feature engineering and data pre-processing. This doesn't implement the entire Pandas API, but comes decently close.
Is there interest in collaborating here?
Carterhaley
Metadata
Metadata
Assignees
Labels
No labels