-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Type: EnhancementIndicates new feature requestsIndicates new feature requests
Milestone
Description
Description
All the following classes use n_neighbors:
ADASYNOneSidedSelectionNeighbourhoodCleaningRuleNearMissAllKNNRepeatedEditedNearestNeighboursEditedNearestNeighboursCondensedNearestNeighbour
Whereas k_neighbors is used with SMOTE and all its variants.
This poses a problem with duck-typing and pipelines.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline
from imblearn.over_sampling import ADASYN
from imblearn.over_sampling import SMOTE
X, y = ...
smote = SMOTE()
adasyn = ADASYN()
logreg = LogisticRegression()
smote_pipe = Pipeline([('sampler', smote), ('classifier', logreg)])
adasyn_pipe = Pipeline([('sampler', adasyn), ('classifier', logreg)])
params = dict(sampler__n_neighbors=range(3, 6))
smote_grid = GridSearchCV(smote_pipe, params)
adasyn_grid = GridSearchCV(adasyn_pipe, params)
# fails due to k_neighbors instead of n_neighbors
# I am forced to make a new params dict
smote_grid.fit(X, y)
# succeeds
adasyn_grid.fit(X, y)
Expected Results
SMOTE would benefit using n_neighbors to have consistent API.
Versions
Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.1
SciPy 1.3.1
Scikit-Learn 0.21.3
Imbalanced-Learn 0.5.0
Metadata
Metadata
Assignees
Labels
Type: EnhancementIndicates new feature requestsIndicates new feature requests