Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats_new/v0.10.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ Compatibility
- Add support for automatic parameters validation as in scikit-learn >= 1.2.
:pr:`955` by :user:`Guillaume Lemaitre <glemaitre>`.

- Add support for `feature_names_in_` as well as `get_feature_names_out` for
all samplers.
:pr:`959` by :user:`Guillaume Lemaitre <glemaitre>`.

Deprecation
...........

Expand Down
14 changes: 13 additions & 1 deletion imblearn/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@

import numpy as np
from sklearn.base import BaseEstimator

try:
# scikit-learn >= 1.2
from sklearn.base import OneToOneFeatureMixin
except ImportError:
from sklearn.base import _OneToOneFeatureMixin as OneToOneFeatureMixin
from sklearn.preprocessing import label_binarize
from sklearn.utils.multiclass import check_classification_targets

Expand Down Expand Up @@ -133,7 +139,7 @@ class attribute, which is a dictionary `param_name: list of constraints`. See
)


class BaseSampler(SamplerMixin, _ParamsValidationMixin):
class BaseSampler(SamplerMixin, OneToOneFeatureMixin, _ParamsValidationMixin):
"""Base class for sampling algorithms.

Warning: This class should not be used directly. Use the derive classes
Expand Down Expand Up @@ -260,6 +266,12 @@ class FunctionSampler(BaseSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
sklearn.preprocessing.FunctionTransfomer : Stateless transformer.
Expand Down
6 changes: 6 additions & 0 deletions imblearn/combine/_smote_enn.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,12 @@ class SMOTEENN(BaseSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTETomek : Over-sample using SMOTE followed by under-sampling removing
Expand Down
6 changes: 6 additions & 0 deletions imblearn/combine/_smote_tomek.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@ class SMOTETomek(BaseSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTEENN : Over-sample using SMOTE followed by under-sampling using Edited
Expand Down
11 changes: 11 additions & 0 deletions imblearn/metrics/pairwise.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,17 @@ class ValueDifferenceMetric(BaseEstimator, _ParamsValidationMixin):
List of length `n_features` containing the conditional probabilities
for each category given a class.

n_features_in_ : int
Number of features in the input dataset.

.. versionadded:: 0.10

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
sklearn.neighbors.DistanceMetric : Interface for fast metric computation.
Expand Down
6 changes: 6 additions & 0 deletions imblearn/over_sampling/_adasyn.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,12 @@ class ADASYN(BaseOverSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTE : Over-sample using SMOTE.
Expand Down
6 changes: 6 additions & 0 deletions imblearn/over_sampling/_random_over_sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ class RandomOverSampler(BaseOverSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
BorderlineSMOTE : Over-sample using the borderline-SMOTE variant.
Expand Down
18 changes: 18 additions & 0 deletions imblearn/over_sampling/_smote/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,12 @@ class SMOTE(BaseSMOTE):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTENC : Over-sample using SMOTE for continuous and categorical features.
Expand Down Expand Up @@ -442,6 +448,12 @@ class SMOTENC(SMOTE):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTE : Over-sample using SMOTE.
Expand Down Expand Up @@ -759,6 +771,12 @@ class SMOTEN(SMOTE):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTE : Over-sample using SMOTE.
Expand Down
6 changes: 6 additions & 0 deletions imblearn/over_sampling/_smote/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,12 @@ class KMeansSMOTE(BaseSMOTE):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTE : Over-sample using SMOTE.
Expand Down
12 changes: 12 additions & 0 deletions imblearn/over_sampling/_smote/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,12 @@ class BorderlineSMOTE(BaseSMOTE):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTE : Over-sample using SMOTE.
Expand Down Expand Up @@ -352,6 +358,12 @@ class SVMSMOTE(BaseSMOTE):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
SMOTE : Over-sample using SMOTE.
Expand Down
16 changes: 16 additions & 0 deletions imblearn/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# Christos Aridas
# License: MIT

import warnings
from collections import OrderedDict

import numpy as np
Expand All @@ -19,6 +20,7 @@
from imblearn.under_sampling import NearMiss, RandomUnderSampler
from imblearn.utils.estimator_checks import (
_set_checking_parameters,
check_dataframe_column_names_consistency,
check_param_validation,
parametrize_with_checks,
)
Expand Down Expand Up @@ -92,3 +94,17 @@ def test_strategy_as_ordered_dict(Sampler):
X_res, y_res = sampler.fit_resample(X, y)
assert X_res.shape[0] == sum(strategy.values())
assert y_res.shape[0] == sum(strategy.values())


@pytest.mark.parametrize(
"estimator", _tested_estimators(), ids=_get_check_estimator_ids
)
def test_pandas_column_name_consistency(estimator):
_set_checking_parameters(estimator)
with ignore_warnings(category=(FutureWarning)):
with warnings.catch_warnings(record=True) as record:
check_dataframe_column_names_consistency(
estimator.__class__.__name__, estimator
)
for warning in record:
assert "was fitted without feature names" not in str(warning.message)
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,12 @@ class ClusterCentroids(BaseUnderSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
EditedNearestNeighbours : Under-sampling by editing samples.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ class CondensedNearestNeighbour(BaseCleaningSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
EditedNearestNeighbours : Undersample by editing samples.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ class EditedNearestNeighbours(BaseCleaningSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
CondensedNearestNeighbour : Undersample by condensing samples.
Expand Down Expand Up @@ -251,6 +257,12 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
CondensedNearestNeighbour : Undersample by condensing samples.
Expand Down Expand Up @@ -454,6 +466,12 @@ class without early stopping.

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
CondensedNearestNeighbour: Under-sampling by condensing samples.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,12 @@ class InstanceHardnessThreshold(BaseUnderSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
NearMiss : Undersample based on near-miss search.
Expand Down
6 changes: 6 additions & 0 deletions imblearn/under_sampling/_prototype_selection/_nearmiss.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ class NearMiss(BaseUnderSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
RandomUnderSampler : Random undersample the dataset.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,12 @@ class NeighbourhoodCleaningRule(BaseCleaningSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
EditedNearestNeighbours : Undersample by editing noisy samples.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,12 @@ class OneSidedSelection(BaseCleaningSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
EditedNearestNeighbours : Undersample by editing noisy samples.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ class RandomUnderSampler(BaseUnderSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
NearMiss : Undersample using near-miss samples.
Expand Down
6 changes: 6 additions & 0 deletions imblearn/under_sampling/_prototype_selection/_tomek_links.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@ class TomekLinks(BaseCleaningSampler):

.. versionadded:: 0.9

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has feature
names that are all strings.

.. versionadded:: 0.10

See Also
--------
EditedNearestNeighbours : Undersample by samples edition.
Expand Down
Loading