Skip to content

Conversation

@lixiilu
Copy link

@lixiilu lixiilu commented Aug 10, 2020

Related Issue: #738

@pep8speaks
Copy link

Hello @tangxi1227! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 174:89: E501 line too long (110 > 88 characters)

@chkoar
Copy link
Member

chkoar commented Aug 11, 2020

Thanks for the PR. Please fix the pep8 issue and add a test.

@chkoar chkoar changed the title modify cluster_centroids, fix a bug [WIP] Fix a bug in ClusterCentroids when using the hard voting strategy Aug 11, 2020
@chkoar chkoar requested a review from glemaitre August 11, 2020 11:59
@lixiilu
Copy link
Author

lixiilu commented Aug 11, 2020

file: imbalanced-learn/imblearn/under_sampling/_prototype_generation/_cluster_centroids.py
function : _fit_resample
line: 174

def _fit_resample(self, X, y):
self._validate_estimator()

    if self.voting == "auto":
        if sparse.issparse(X):
            self.voting_ = "hard"
        else:
            self.voting_ = "soft"
    else:
        if self.voting in VOTING_KIND:
            self.voting_ = self.voting
        else:
            raise ValueError(
                "'voting' needs to be one of {}. Got {}"
                " instead.".format(VOTING_KIND, self.voting)
            )

    X_resampled, y_resampled = [], []
    for target_class in np.unique(y):
        if target_class in self.sampling_strategy_.keys():
            n_samples = self.sampling_strategy_[target_class]
            self.estimator_.set_params(**{"n_clusters": n_samples})
            self.estimator_.fit(X[y == target_class])
            X_new, y_new = self._generate_sample(
                X[y == target_class], y[y == target_class], self.estimator_.cluster_centers_, target_class
            )
            X_resampled.append(X_new)
            y_resampled.append(y_new)
        else:
            target_class_indices = np.flatnonzero(y == target_class)
            X_resampled.append(_safe_indexing(X, target_class_indices))
            y_resampled.append(_safe_indexing(y, target_class_indices))

    if sparse.issparse(X):
        X_resampled = sparse.vstack(X_resampled)
    else:
        X_resampled = np.vstack(X_resampled)
    y_resampled = np.hstack(y_resampled)

    return X_resampled, np.array(y_resampled, dtype=y.dtype)

@glemaitre
Copy link
Member

A potential test would be to check that there is no minority sample once you resample in the majority part. This could be quite easy to implement. We would also need an entry in what's new since it is affecting end-user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants