Skip to content

Dead dataset used in documentation #1040

@eroell

Description

@eroell

Describe the bug

The documentation of CondensedNearestNeighbors uses a dead dataset.

Steps/Code to Reproduce

from collections import Counter  
from sklearn.datasets import fetch_mldata  
from imblearn.under_sampling import CondensedNearestNeighbour  
pima = fetch_mldata('diabetes_scale')  
X, y = pima['data'], pima['target']  
print('Original dataset shape %s' % Counter(y))  
cnn = CondensedNearestNeighbour(random_state=42)  
X_res, y_res = cnn.fit_resample(X, y)  
print('Resampled dataset shape %s' % Counter(y_res)) 

Expected Results

No error thrown. The outputs should be

Original dataset shape Counter({1: 500, -1: 268})  
Resampled dataset shape Counter({-1: 268, 1: 227})  

Actual Results

ImportError: cannot import name 'fetch_mldata' from 'sklearn.datasets' (/Users/eljas.roellin/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/sklearn/datasets/__init__.py)

Explanation

This dataset has been discontinued on sklearn.datasets, see here.

Suggested Solution

Any other small unbalanced dataset with 2 classes could be used for showcase instead I think.

Versions

System:
    python: 3.11.4 (main, Jul  5 2023, 08:40:20) [Clang 14.0.6 ]
executable: /Users/USER/Documents/imbalance/imbalance_venv/bin/python
   machine: macOS-13.5.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.3.0
          pip: 23.2.1
   setuptools: 65.5.0
        numpy: 1.25.2
        scipy: 1.11.2
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/USER/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/USER/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/USER/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: armv8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions