Skip to content

[BUG] validate argument does not work in imblearn.FunctionSampler #782

@raphaelberly

Description

@raphaelberly

Describe the bug

validate argument in imblearn.FunctionSampler does not seem to work. When the dataset contains null values, fit crashes even though validate is False and the sampler does not depend on dataset values.

Steps/Code to Reproduce

import pandas as pd
from imblearn import FunctionSampler

X = pd.DataFrame([{'a': 1, 'b': 1}, {'a': 1, 'b': None}])
y = pd.Series([1, 0])


def func(X, y):
    return X[:1], y[:1]


sampler = FunctionSampler(func=func, validate=False)
sampler.fit(X, y)

Expected Results

No error is thrown.

Actual Results

Traceback (most recent call last):
  File "...", line 13, in <module>
    sampler.fit(X, y)
  File ".../site-packages/imblearn/base.py", line 48, in fit
    X, y, _ = self._check_X_y(X, y)
  File ".../site-packages/imblearn/base.py", line 135, in _check_X_y
    X, y, reset=True, accept_sparse=accept_sparse
  File ".../site-packages/sklearn/base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File ".../site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File ".../site-packages/sklearn/utils/validation.py", line 802, in check_X_y
    estimator=estimator)
  File ".../site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File ".../site-packages/sklearn/utils/validation.py", line 645, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File ".../site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Versions

System:
    python: 3.7.7 (default, Mar 10 2020, 15:43:03)  [Clang 11.0.0 (clang-1100.0.33.17)]
   machine: Darwin-19.6.0-x86_64-i386-64bit

Python dependencies:
          pip: 20.2.3
   setuptools: 46.1.2
      sklearn: 0.23.2
        numpy: 1.18.4
        scipy: 1.4.1
       Cython: None
       pandas: 1.1.3
   matplotlib: 3.2.1
       joblib: 0.14.1
threadpoolctl: 2.0.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugIndicates an unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions