@@ -125,22 +125,24 @@ It would also work with pandas dataframe::
125125 >>> df_resampled, y_resampled = rus.fit_resample(df_adult, y_adult)
126126 >>> df_resampled.head() # doctest: +SKIP
127127
128- :class: `NearMiss ` adds some heuristic rules to select samples
129- :cite: `mani2003knn `. :class: `NearMiss ` implements 3 different types of
130- heuristic which can be selected with the parameter ``version ``::
128+ :class: `NearMiss ` undersamples data based on heuristic rules to select the
129+ observations :cite: `mani2003knn `. :class: `NearMiss ` implements 3 different
130+ methods to undersample, which can be selected with the parameter ``version ``::
131131
132132 >>> from imblearn.under_sampling import NearMiss
133133 >>> nm1 = NearMiss(version=1)
134134 >>> X_resampled_nm1, y_resampled = nm1.fit_resample(X, y)
135135 >>> print(sorted(Counter(y_resampled).items()))
136136 [(0, 64), (1, 64), (2, 64)]
137137
138- As later stated in the next section, :class: `NearMiss ` heuristic rules are
139- based on nearest neighbors algorithm. Therefore, the parameters ``n_neighbors ``
140- and ``n_neighbors_ver3 `` accept classifier derived from ``KNeighborsMixin ``
141- from scikit-learn. The former parameter is used to compute the average distance
142- to the neighbors while the latter is used for the pre-selection of the samples
143- of interest.
138+
139+ :class: `NearMiss ` heuristic rules are based on the nearest neighbors algorithm.
140+ Therefore, the parameters ``n_neighbors `` and ``n_neighbors_ver3 `` accept either
141+ integers with the size of the neighbourhood to explore or a classifier derived
142+ from the ``KNeighborsMixin `` from scikit-learn. The parameter ``n_neighbors `` is
143+ used to compute the average distance to the neighbors while ``n_neighbors_ver3 ``
144+ is used for the pre-selection of the samples from the majority class, only in
145+ version 3. More details about NearMiss in the next section.
144146
145147Mathematical formulation
146148^^^^^^^^^^^^^^^^^^^^^^^^
@@ -175,19 +177,16 @@ is the largest.
175177 :scale: 60
176178 :align: center
177179
178- In the next example, the different :class: `NearMiss ` variant are applied on the
179- previous toy example. It can be seen that the decision functions obtained in
180+ In the next example, the different :class: `NearMiss ` variants are applied on the
181+ previous toy example. We can see that the decision functions obtained in
180182each case are different.
181183
182- When under-sampling a specific class, NearMiss-1 can be altered by the presence
183- of noise. In fact, it will implied that samples of the targeted class will be
184- selected around these samples as it is the case in the illustration below for
185- the yellow class. However, in the normal case, samples next to the boundaries
186- will be selected. NearMiss-2 will not have this effect since it does not focus
187- on the nearest samples but rather on the farthest samples. We can imagine that
188- the presence of noise can also altered the sampling mainly in the presence of
189- marginal outliers. NearMiss-3 is probably the version which will be less
190- affected by noise due to the first step sample selection.
184+ When under-sampling a specific class, NearMiss-1 can be affected by noise. In
185+ fact, samples of the targeted class located around observations from the minority
186+ class tend to be selected, as shown in the illustration below (see yellow class).
187+ NearMiss-2 might be less affected by noise as it does not focus on the nearest
188+ samples but rather on the farthest samples. NearMiss-3 is probably the version
189+ which will be less affected by noise due to the first step of sample selection.
191190
192191.. image :: ./auto_examples/under-sampling/images/sphx_glr_plot_comparison_under_sampling_003.png
193192 :target: ./auto_examples/under-sampling/plot_comparison_under_sampling.html
0 commit comments