From 05e2451f24c1e63dc5a64d22306084c5699222bc Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Mon, 10 Jul 2023 22:04:48 +0200
Subject: [PATCH 1/8] update user guide CNN and OSS

---
 doc/under_sampling.rst | 56 +++++++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 17 deletions(-)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index 9f2795430..bfb6cb039 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -306,20 +306,24 @@ impact by cleaning noisy samples next to the boundaries of the classes.
 
 .. _condensed_nearest_neighbors:
 
-Condensed nearest neighbors and derived algorithms
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Condensed nearest neighbors
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 :class:`CondensedNearestNeighbour` uses a 1 nearest neighbor rule to
-iteratively decide if a sample should be removed or not
-:cite:`hart1968condensed`. The algorithm is running as followed:
+iteratively decide if a sample should be removed
+:cite:`hart1968condensed`. The algorithm runs as follows:
 
 1. Get all minority samples in a set :math:`C`.
 2. Add a sample from the targeted class (class to be under-sampled) in
    :math:`C` and all other samples of this class in a set :math:`S`.
-3. Go through the set :math:`S`, sample by sample, and classify each sample
-   using a 1 nearest neighbor rule.
-4. If the sample is misclassified, add it to :math:`C`, otherwise do nothing.
-5. Reiterate on :math:`S` until there is no samples to be added.
+3. Train a 1-KNN on `C`.
+4. Go through the samples in set :math:`S`, sample by sample, and classify each one
+   using a 1 nearest neighbor rule (trained in 3).
+5. If the sample is misclassified, add it to :math:`C`, and go to step 6.
+6. Repeat steps 3 to 5 until all observations in `S` have been examined.
+
+The final dataset is `S`, containing all observations from the minority class and
+those from the majority that were miss-classified by the successive 1-KNN algorithms.
 
 The :class:`CondensedNearestNeighbour` can be used in the following manner::
 
@@ -329,14 +333,32 @@ The :class:`CondensedNearestNeighbour` can be used in the following manner::
   >>> print(sorted(Counter(y_resampled).items()))
   [(0, 64), (1, 24), (2, 115)]
 
-However as illustrated in the figure below, :class:`CondensedNearestNeighbour`
-is sensitive to noise and will add noisy samples.
+:class:`CondensedNearestNeighbour` is sensitive to noise and may add noisy samples
+(see figure later on).
+
+One Sided Selection
+~~~~~~~~~~~~~~~~~~~
+
+In an attempt to remove noisy observations, :class:`OneSidedSelection`
+will first find the observations that are hard to classify, and then will use
+:class:`TomekLinks` to remove noisy samples :cite:`hart1968condensed`.
+:class:`OneSidedSelection` runs as follows:
 
-In the contrary, :class:`OneSidedSelection` will use :class:`TomekLinks` to
-remove noisy samples :cite:`hart1968condensed`. In addition, the 1 nearest
-neighbor rule is applied to all samples and the one which are misclassified
-will be added to the set :math:`C`. No iteration on the set :math:`S` will take
-place. The class can be used as::
+1. Get all minority samples in a set :math:`C`.
+2. Add a sample from the targeted class (class to be under-sampled) in
+   :math:`C` and all other samples of this class in a set :math:`S`.
+3. Train a 1-KNN on `C`.
+4. Using a 1 nearest neighbor rule trained in 3, classify all samples in
+   set :math:`S`.
+5. Add all misclassified samples to :math:`C`.
+6. Remove Tomek Links from :math:`C`.
+
+The final dataset is `S`, containing all observations from the minority class,
+plus the observations from the majority that were added at random, plus all
+those from the majority that were miss-classified by the 1-KNN algorithms. Note
+that differently from :class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`
+does not train a KNN after each sample is missclassified. It uses the one KNN
+to classify all samples from the majority in 1 pass. The class can be used as::
 
   >>> from imblearn.under_sampling import OneSidedSelection
   >>> oss = OneSidedSelection(random_state=0)
@@ -344,8 +366,8 @@ place. The class can be used as::
   >>> print(sorted(Counter(y_resampled).items()))
   [(0, 64), (1, 174), (2, 4404)]
 
-Our implementation offer to set the number of seeds to put in the set :math:`C`
-originally by setting the parameter ``n_seeds_S``.
+Our implementation offers the possibility to set the number of observations
+to put at random in the set :math:`C` through the parameter ``n_seeds_S``.
 
 :class:`NeighbourhoodCleaningRule` will focus on cleaning the data than
 condensing them :cite:`laurikkala2001improving`. Therefore, it will used the

From eb2ec39f9b7484a3c0d43fe2a2242dd1744ab2e1 Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Mon, 10 Jul 2023 22:09:13 +0200
Subject: [PATCH 2/8] final touches

---
 doc/under_sampling.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index bfb6cb039..5168e1491 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -339,7 +339,8 @@ The :class:`CondensedNearestNeighbour` can be used in the following manner::
 One Sided Selection
 ~~~~~~~~~~~~~~~~~~~
 
-In an attempt to remove noisy observations, :class:`OneSidedSelection`
+In an attempt to remove the noisy observations introduced by
+:class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`
 will first find the observations that are hard to classify, and then will use
 :class:`TomekLinks` to remove noisy samples :cite:`hart1968condensed`.
 :class:`OneSidedSelection` runs as follows:

From 9816834e819a05eccbc4ff457f954bcf7d7c8f69 Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Tue, 11 Jul 2023 10:53:46 +0200
Subject: [PATCH 3/8] expand knn name

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
---
 doc/under_sampling.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index 5168e1491..c4565f0cf 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -348,7 +348,7 @@ will first find the observations that are hard to classify, and then will use
 1. Get all minority samples in a set :math:`C`.
 2. Add a sample from the targeted class (class to be under-sampled) in
    :math:`C` and all other samples of this class in a set :math:`S`.
-3. Train a 1-KNN on `C`.
+3. Train a 1-Nearest Neighbors on `C`.
 4. Using a 1 nearest neighbor rule trained in 3, classify all samples in
    set :math:`S`.
 5. Add all misclassified samples to :math:`C`.

From aa80fd7c6edf22a558d4330a4bca141549f48976 Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Tue, 11 Jul 2023 10:54:16 +0200
Subject: [PATCH 4/8] add missing math instruction

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
---
 doc/under_sampling.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index c4565f0cf..4d4e70b04 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -354,7 +354,7 @@ will first find the observations that are hard to classify, and then will use
 5. Add all misclassified samples to :math:`C`.
 6. Remove Tomek Links from :math:`C`.
 
-The final dataset is `S`, containing all observations from the minority class,
+The final dataset is :math:`S`, containing all observations from the minority class,
 plus the observations from the majority that were added at random, plus all
 those from the majority that were miss-classified by the 1-KNN algorithms. Note
 that differently from :class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`

From e361d85bb3dbf00718e41a535577f4d54de28540 Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Tue, 11 Jul 2023 10:54:35 +0200
Subject: [PATCH 5/8] expand knn name

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
---
 doc/under_sampling.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index 4d4e70b04..acdf08880 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -356,7 +356,7 @@ will first find the observations that are hard to classify, and then will use
 
 The final dataset is :math:`S`, containing all observations from the minority class,
 plus the observations from the majority that were added at random, plus all
-those from the majority that were miss-classified by the 1-KNN algorithms. Note
+those from the majority that were miss-classified by the 1-Nearest Neighbors algorithms. Note
 that differently from :class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`
 does not train a KNN after each sample is missclassified. It uses the one KNN
 to classify all samples from the majority in 1 pass. The class can be used as::

From ad739f38b3c41b7b1450b0749f77b9c2fb74709d Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Tue, 11 Jul 2023 10:55:06 +0200
Subject: [PATCH 6/8] expand knn name and fix typo

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
---
 doc/under_sampling.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index acdf08880..a1b3ce48e 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -358,7 +358,7 @@ The final dataset is :math:`S`, containing all observations from the minority cl
 plus the observations from the majority that were added at random, plus all
 those from the majority that were miss-classified by the 1-Nearest Neighbors algorithms. Note
 that differently from :class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`
-does not train a KNN after each sample is missclassified. It uses the one KNN
+does not train a K-Nearet Neighbors after each sample is misclassified. It uses the one K-Nearest Neighbors
 to classify all samples from the majority in 1 pass. The class can be used as::
 
   >>> from imblearn.under_sampling import OneSidedSelection

From 1c7e04e4485e6e4eafeb1a317ec602e404d98459 Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Tue, 11 Jul 2023 11:05:25 +0200
Subject: [PATCH 7/8] expanded knn to full name and added missing :math:

---
 doc/under_sampling.rst | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index a1b3ce48e..9d717ccba 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -316,14 +316,15 @@ iteratively decide if a sample should be removed
 1. Get all minority samples in a set :math:`C`.
 2. Add a sample from the targeted class (class to be under-sampled) in
    :math:`C` and all other samples of this class in a set :math:`S`.
-3. Train a 1-KNN on `C`.
+3. Train a 1-Nearest Neigbhour on :math:`C`.
 4. Go through the samples in set :math:`S`, sample by sample, and classify each one
    using a 1 nearest neighbor rule (trained in 3).
 5. If the sample is misclassified, add it to :math:`C`, and go to step 6.
-6. Repeat steps 3 to 5 until all observations in `S` have been examined.
+6. Repeat steps 3 to 5 until all observations in :math:`S` have been examined.
 
-The final dataset is `S`, containing all observations from the minority class and
-those from the majority that were miss-classified by the successive 1-KNN algorithms.
+The final dataset is :math:`S`, containing all observations from the minority class and
+those from the majority that were miss-classified by the successive
+1-Nearest Neigbhour algorithms.
 
 The :class:`CondensedNearestNeighbour` can be used in the following manner::
 
@@ -348,7 +349,7 @@ will first find the observations that are hard to classify, and then will use
 1. Get all minority samples in a set :math:`C`.
 2. Add a sample from the targeted class (class to be under-sampled) in
    :math:`C` and all other samples of this class in a set :math:`S`.
-3. Train a 1-Nearest Neighbors on `C`.
+3. Train a 1-Nearest Neighbors on :math:`C`.
 4. Using a 1 nearest neighbor rule trained in 3, classify all samples in
    set :math:`S`.
 5. Add all misclassified samples to :math:`C`.
@@ -356,10 +357,11 @@ will first find the observations that are hard to classify, and then will use
 
 The final dataset is :math:`S`, containing all observations from the minority class,
 plus the observations from the majority that were added at random, plus all
-those from the majority that were miss-classified by the 1-Nearest Neighbors algorithms. Note
-that differently from :class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`
-does not train a K-Nearet Neighbors after each sample is misclassified. It uses the one K-Nearest Neighbors
-to classify all samples from the majority in 1 pass. The class can be used as::
+those from the majority that were miss-classified by the 1-Nearest Neighbors algorithms.
+Note that differently from :class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`
+does not train a K-Nearest Neighbors after each sample is misclassified. It uses the
+1-Nearest Neighbors from step 3 to classify all samples from the majority in 1 pass.
+The class can be used as::
 
   >>> from imblearn.under_sampling import OneSidedSelection
   >>> oss = OneSidedSelection(random_state=0)

From 3a787f450d0ecced3a67fab45ceaecca259b4154 Mon Sep 17 00:00:00 2001
From: Soledad Galli <solegalli@protonmail.com>
Date: Tue, 11 Jul 2023 11:06:51 +0200
Subject: [PATCH 8/8] split paragraph

---
 doc/under_sampling.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
index 9d717ccba..fd9f43c0e 100644
--- a/doc/under_sampling.rst
+++ b/doc/under_sampling.rst
@@ -358,6 +358,7 @@ will first find the observations that are hard to classify, and then will use
 The final dataset is :math:`S`, containing all observations from the minority class,
 plus the observations from the majority that were added at random, plus all
 those from the majority that were miss-classified by the 1-Nearest Neighbors algorithms.
+
 Note that differently from :class:`CondensedNearestNeighbour`, :class:`OneSidedSelection`
 does not train a K-Nearest Neighbors after each sample is misclassified. It uses the
 1-Nearest Neighbors from step 3 to classify all samples from the majority in 1 pass.