Skip to content

Commit db0f17d

Browse files
Add documentation for SMOTEN
1 parent bc96223 commit db0f17d

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

doc/over_sampling.rst

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ parameters ``categorical_features`` either by passing the indices of these
183183
features or a boolean mask marking these features::
184184

185185
>>> from imblearn.over_sampling import SMOTENC
186-
>>> smote_nc = SMOTENC(categorical_features=[0, 2], random_state=0)
186+
>>> smote_nc = SMOTEN(categorical_features=[0, 2], random_state=0)
187187
>>> X_resampled, y_resampled = smote_nc.fit_resample(X, y)
188188
>>> print(sorted(Counter(y_resampled).items()))
189189
[(0, 30), (1, 30)]
@@ -198,6 +198,15 @@ Therefore, it can be seen that the samples generated in the first and last
198198
columns are belonging to the same categories originally presented without any
199199
other extra interpolation.
200200

201+
Furthermore, if the dataset solely consists of categorical features one may use the :class:`SMOTEN` class. This class generates samples in an identical fashion to :class:`SMOTENC` - however - only categorical features are permitted. Each feature is treated as a categorical feature and therefore it is not advised to use `SMOTEN` for datasets that contain both categorical and continious features::
202+
203+
>>> from imblearn.over_sampling import SMOTEN
204+
>>> smote_n = SMOTEN(random_state=0)
205+
>>> X[:, 1] = rng.randint(2, size=n_samples)
206+
>>> X_resampled, y_resampled = smote_n.fit_resample(X, y)
207+
>>> print(sorted(Counter(y_resampled).items()))
208+
[(0, 30), (1, 30)]
209+
201210
.. topic:: References
202211

203212
.. [HWB2005] H. Han, W. Wen-Yuan, M. Bing-Huan, "Borderline-SMOTE: a new

doc/whats_new/v0.5.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ Enhancement
2727
and issue template showing how to print system and dependency information
2828
from the command line. :issue:`557` by :user:`Alexander L. Hayes <batflyer>`.
2929

30+
- Add :class:`SMOTEN`. Add ability to use SMOTE on pure categorical features.
31+
by :user:`Thomas Kluiters <ThomasKluiters`.
32+
3033
Maintenance
3134
...........
3235

0 commit comments

Comments
 (0)