diff --git a/src/python/docs/docstrings/EnsembleClassifier.txt b/src/python/docs/docstrings/EnsembleClassifier.txt
index 47cbdd13..9feb242e 100644
--- a/src/python/docs/docstrings/EnsembleClassifier.txt
+++ b/src/python/docs/docstrings/EnsembleClassifier.txt
@@ -30,14 +30,14 @@
 		* ``RandomFeatureSelector``: selects a random subset of the features
 		  for each model.
 
-	:param num_models: indicates the number models to train, i.e. the number of
+	:param num_models: Indicates the number models to train, i.e. the number of
 	    subsets of the training set to sample. The default value is 50. If
 		batches are used then this indicates the number of models per batch.
 
 	:param sub_model_selector_type: Determines the efficient set of models the
-	``output_combiner`` uses, and removes the least significant models. This is
-	used to improve the accuracy and reduce the model size. This is also called
-	pruning.
+	    ``output_combiner`` uses, and removes the least significant models.
+	    This is used to improve the accuracy and reduce the model size. This is
+		also called pruning.
 	
 	    * ``ClassifierAllSelector``: does not perform any pruning and selects
 	      all models in the ensemble to combine to create the output. This is
@@ -51,9 +51,9 @@
 		  or ``"LogLossReduction"``.
 	
 
-	:param output_combiner: indicates how to combine the predictions of the different
-	    models into a single prediction. There are five available output
-		combiners for clasification:
+	:param output_combiner: Indicates how to combine the predictions of the
+	    different models into a single prediction. There are five available
+		outputcombiners for clasification:
 
 		* ``ClassifierAverage``: computes the average of the scores produced by
 		  the trained models.
@@ -92,7 +92,7 @@
         and ``0 <= b <= 1`` and ``b - a = 1``. This normalizer preserves
         sparsity by mapping zero to zero.
 
-	:param batch_size: train the models iteratively on subsets of the training
+	:param batch_size: Train the models iteratively on subsets of the training
 	    set of this size. When using this option, it is assumed that the
 		training set is randomized enough so that every batch is a random
 		sample of instances. The default value is -1, indicating using the
diff --git a/src/python/docs/docstrings/EnsembleRegressor.txt b/src/python/docs/docstrings/EnsembleRegressor.txt
index cb859674..80703736 100644
--- a/src/python/docs/docstrings/EnsembleRegressor.txt
+++ b/src/python/docs/docstrings/EnsembleRegressor.txt
@@ -30,14 +30,14 @@
 		* ``RandomFeatureSelector``: selects a random subset of the features
 		  for each model.
 
-	:param num_models: indicates the number models to train, i.e. the number of
+	:param num_models: Indicates the number models to train, i.e. the number of
 	    subsets of the training set to sample. The default value is 50. If
 		batches are used then this indicates the number of models per batch.
 
 	:param sub_model_selector_type: Determines the efficient set of models the
-	``output_combiner`` uses, and removes the least significant models. This is
-	used to improve the accuracy and reduce the model size. This is also called
-	pruning.
+	    ``output_combiner`` uses, and removes the least significant models.
+	    This is used to improve the accuracy and reduce the model size. This is
+		also called pruning.
 	
 	    * ``RegressorAllSelector``: does not perform any pruning and selects
 	      all models in the ensemble to combine to create the output. This is
@@ -51,9 +51,9 @@
 		  ``"RSquared"``.
 	
 
-	:param output_combiner: indicates how to combine the predictions of the different
-	    models into a single prediction. There are five available output
-		combiners for clasification:
+	:param output_combiner: Indicates how to combine the predictions of the
+	    different models into a single prediction. There are five available
+		output combiners for clasification:
 
 		* ``RegressorAverage``: computes the average of the scores produced by
 		  the trained models.
@@ -86,7 +86,7 @@
         and ``0 <= b <= 1`` and ``b - a = 1``. This normalizer preserves
         sparsity by mapping zero to zero.
 
-	:param batch_size: train the models iteratively on subsets of the training
+	:param batch_size: Train the models iteratively on subsets of the training
 	    set of this size. When using this option, it is assumed that the
 		training set is randomized enough so that every batch is a random
 		sample of instances. The default value is -1, indicating using the
diff --git a/src/python/docs/docstrings/LinearSvmBinaryClassifier.txt b/src/python/docs/docstrings/LinearSvmBinaryClassifier.txt
index 4a9cf5ad..9f3d544b 100644
--- a/src/python/docs/docstrings/LinearSvmBinaryClassifier.txt
+++ b/src/python/docs/docstrings/LinearSvmBinaryClassifier.txt
@@ -5,12 +5,10 @@
     .. remarks::
         Linear SVM implements an algorithm that finds a hyperplane in the
 		feature space for binary classification, by solving an SVM problem.
-		For instance, with feature values $f_0, f_1,..., f_{D-1}$, the
-		prediction is given by determining what side of the hyperplane the
-		point falls into. That is the same as the sign of the feautures'
-		weighted sum, i.e. $\sum_{i = 0}^{D-1} \left(w_i * f_i \right) + b$,
-		where $w_0, w_1,..., w_{D-1}$ are the weights computed by the
-		algorithm, and *b* is the bias computed by the algorithm.
+		For instance, for a given feature vector, the prediction is given by
+		determining what side of the hyperplane the	point falls into. That is
+		the same as the sign of the feautures' weighted sum (the weights being
+		computed by the algorithm) plus the bias computed by the algorithm.
 
 		This algorithm implemented is the PEGASOS method, which alternates
 		between stochastic gradient descent steps and projection steps,
diff --git a/src/python/nimbusml/ensemble/ensembleclassifier.py b/src/python/nimbusml/ensemble/ensembleclassifier.py
index d99e3b71..aa5f1720 100644
--- a/src/python/nimbusml/ensemble/ensembleclassifier.py
+++ b/src/python/nimbusml/ensemble/ensembleclassifier.py
@@ -57,14 +57,14 @@ class EnsembleClassifier(core, BasePredictor, ClassifierMixin):
         * ``RandomFeatureSelector``: selects a random subset of the features
           for each model.
 
-    :param num_models: indicates the number models to train, i.e. the number of
+    :param num_models: Indicates the number models to train, i.e. the number of
         subsets of the training set to sample. The default value is 50. If
         batches are used then this indicates the number of models per batch.
 
     :param sub_model_selector_type: Determines the efficient set of models the
-    ``output_combiner`` uses, and removes the least significant models. This is
-    used to improve the accuracy and reduce the model size. This is also called
-    pruning.
+        ``output_combiner`` uses, and removes the least significant models.
+        This is used to improve the accuracy and reduce the model size. This is
+        also called pruning.
 
         * ``ClassifierAllSelector``: does not perform any pruning and selects
           all models in the ensemble to combine to create the output. This is
@@ -77,9 +77,9 @@ class EnsembleClassifier(core, BasePredictor, ClassifierMixin):
           ``"AccuracyMicro"``, ``"AccuracyMacro"``,    ``"LogLoss"``,
           or ``"LogLossReduction"``.
 
-    :param output_combiner: indicates how to combine the predictions of the different
-        models into a single prediction. There are five available output
-        combiners for clasification:
+    :param output_combiner: Indicates how to combine the predictions of the
+        different models into a single prediction. There are five available
+        outputcombiners for clasification:
 
         * ``ClassifierAverage``: computes the average of the scores produced by
           the trained models.
@@ -123,7 +123,7 @@ class EnsembleClassifier(core, BasePredictor, ClassifierMixin):
     :param train_parallel: All the base learners will run asynchronously if the
         value is true.
 
-    :param batch_size: train the models iteratively on subsets of the training
+    :param batch_size: Train the models iteratively on subsets of the training
         set of this size. When using this option, it is assumed that the
         training set is randomized enough so that every batch is a random
         sample of instances. The default value is -1, indicating using the
diff --git a/src/python/nimbusml/ensemble/ensembleregressor.py b/src/python/nimbusml/ensemble/ensembleregressor.py
index 1b7aac76..abf3744e 100644
--- a/src/python/nimbusml/ensemble/ensembleregressor.py
+++ b/src/python/nimbusml/ensemble/ensembleregressor.py
@@ -57,14 +57,14 @@ class EnsembleRegressor(core, BasePredictor, RegressorMixin):
         * ``RandomFeatureSelector``: selects a random subset of the features
           for each model.
 
-    :param num_models: indicates the number models to train, i.e. the number of
+    :param num_models: Indicates the number models to train, i.e. the number of
         subsets of the training set to sample. The default value is 50. If
         batches are used then this indicates the number of models per batch.
 
     :param sub_model_selector_type: Determines the efficient set of models the
-    ``output_combiner`` uses, and removes the least significant models. This is
-    used to improve the accuracy and reduce the model size. This is also called
-    pruning.
+        ``output_combiner`` uses, and removes the least significant models.
+        This is used to improve the accuracy and reduce the model size. This is
+        also called pruning.
 
         * ``RegressorAllSelector``: does not perform any pruning and selects
           all models in the ensemble to combine to create the output. This is
@@ -77,9 +77,9 @@ class EnsembleRegressor(core, BasePredictor, RegressorMixin):
           can be ``"L1"``, ``"L2"``, ``"Rms"``, or ``"Loss"``, or
           ``"RSquared"``.
 
-    :param output_combiner: indicates how to combine the predictions of the different
-        models into a single prediction. There are five available output
-        combiners for clasification:
+    :param output_combiner: Indicates how to combine the predictions of the
+        different models into a single prediction. There are five available
+        output combiners for clasification:
 
         * ``RegressorAverage``: computes the average of the scores produced by
           the trained models.
@@ -117,7 +117,7 @@ class EnsembleRegressor(core, BasePredictor, RegressorMixin):
     :param train_parallel: All the base learners will run asynchronously if the
         value is true.
 
-    :param batch_size: train the models iteratively on subsets of the training
+    :param batch_size: Train the models iteratively on subsets of the training
         set of this size. When using this option, it is assumed that the
         training set is randomized enough so that every batch is a random
         sample of instances. The default value is -1, indicating using the
diff --git a/src/python/nimbusml/internal/core/ensemble/ensembleclassifier.py b/src/python/nimbusml/internal/core/ensemble/ensembleclassifier.py
index 083413f1..3119060c 100644
--- a/src/python/nimbusml/internal/core/ensemble/ensembleclassifier.py
+++ b/src/python/nimbusml/internal/core/ensemble/ensembleclassifier.py
@@ -57,14 +57,14 @@ class EnsembleClassifier(
         * ``RandomFeatureSelector``: selects a random subset of the features
           for each model.
 
-    :param num_models: indicates the number models to train, i.e. the number of
+    :param num_models: Indicates the number models to train, i.e. the number of
         subsets of the training set to sample. The default value is 50. If
         batches are used then this indicates the number of models per batch.
 
     :param sub_model_selector_type: Determines the efficient set of models the
-    ``output_combiner`` uses, and removes the least significant models. This is
-    used to improve the accuracy and reduce the model size. This is also called
-    pruning.
+        ``output_combiner`` uses, and removes the least significant models.
+        This is used to improve the accuracy and reduce the model size. This is
+        also called pruning.
 
         * ``ClassifierAllSelector``: does not perform any pruning and selects
           all models in the ensemble to combine to create the output. This is
@@ -77,9 +77,9 @@ class EnsembleClassifier(
           ``"AccuracyMicro"``, ``"AccuracyMacro"``,    ``"LogLoss"``,
           or ``"LogLossReduction"``.
 
-    :param output_combiner: indicates how to combine the predictions of the different
-        models into a single prediction. There are five available output
-        combiners for clasification:
+    :param output_combiner: Indicates how to combine the predictions of the
+        different models into a single prediction. There are five available
+        outputcombiners for clasification:
 
         * ``ClassifierAverage``: computes the average of the scores produced by
           the trained models.
@@ -123,7 +123,7 @@ class EnsembleClassifier(
     :param train_parallel: All the base learners will run asynchronously if the
         value is true.
 
-    :param batch_size: train the models iteratively on subsets of the training
+    :param batch_size: Train the models iteratively on subsets of the training
         set of this size. When using this option, it is assumed that the
         training set is randomized enough so that every batch is a random
         sample of instances. The default value is -1, indicating using the
diff --git a/src/python/nimbusml/internal/core/ensemble/ensembleregressor.py b/src/python/nimbusml/internal/core/ensemble/ensembleregressor.py
index cc0935c7..efb18347 100644
--- a/src/python/nimbusml/internal/core/ensemble/ensembleregressor.py
+++ b/src/python/nimbusml/internal/core/ensemble/ensembleregressor.py
@@ -55,14 +55,14 @@ class EnsembleRegressor(
         * ``RandomFeatureSelector``: selects a random subset of the features
           for each model.
 
-    :param num_models: indicates the number models to train, i.e. the number of
+    :param num_models: Indicates the number models to train, i.e. the number of
         subsets of the training set to sample. The default value is 50. If
         batches are used then this indicates the number of models per batch.
 
     :param sub_model_selector_type: Determines the efficient set of models the
-    ``output_combiner`` uses, and removes the least significant models. This is
-    used to improve the accuracy and reduce the model size. This is also called
-    pruning.
+        ``output_combiner`` uses, and removes the least significant models.
+        This is used to improve the accuracy and reduce the model size. This is
+        also called pruning.
 
         * ``RegressorAllSelector``: does not perform any pruning and selects
           all models in the ensemble to combine to create the output. This is
@@ -75,9 +75,9 @@ class EnsembleRegressor(
           can be ``"L1"``, ``"L2"``, ``"Rms"``, or ``"Loss"``, or
           ``"RSquared"``.
 
-    :param output_combiner: indicates how to combine the predictions of the different
-        models into a single prediction. There are five available output
-        combiners for clasification:
+    :param output_combiner: Indicates how to combine the predictions of the
+        different models into a single prediction. There are five available
+        output combiners for clasification:
 
         * ``RegressorAverage``: computes the average of the scores produced by
           the trained models.
@@ -115,7 +115,7 @@ class EnsembleRegressor(
     :param train_parallel: All the base learners will run asynchronously if the
         value is true.
 
-    :param batch_size: train the models iteratively on subsets of the training
+    :param batch_size: Train the models iteratively on subsets of the training
         set of this size. When using this option, it is assumed that the
         training set is randomized enough so that every batch is a random
         sample of instances. The default value is -1, indicating using the
diff --git a/src/python/nimbusml/internal/core/linear_model/linearsvmbinaryclassifier.py b/src/python/nimbusml/internal/core/linear_model/linearsvmbinaryclassifier.py
index 41996a33..0109ba44 100644
--- a/src/python/nimbusml/internal/core/linear_model/linearsvmbinaryclassifier.py
+++ b/src/python/nimbusml/internal/core/linear_model/linearsvmbinaryclassifier.py
@@ -26,12 +26,10 @@ class LinearSvmBinaryClassifier(
     .. remarks::
         Linear SVM implements an algorithm that finds a hyperplane in the
         feature space for binary classification, by solving an SVM problem.
-        For instance, with feature values $f_0, f_1,..., f_{D-1}$, the
-        prediction is given by determining what side of the hyperplane the
-        point falls into. That is the same as the sign of the feautures'
-        weighted sum, i.e. $\sum_{i = 0}^{D-1} \left(w_i * f_i \right) + b$,
-        where $w_0, w_1,..., w_{D-1}$ are the weights computed by the
-        algorithm, and *b* is the bias computed by the algorithm.
+        For instance, for a given feature vector, the prediction is given by
+        determining what side of the hyperplane the    point falls into. That is
+        the same as the sign of the feautures' weighted sum (the weights being
+        computed by the algorithm) plus the bias computed by the algorithm.
 
         This algorithm implemented is the PEGASOS method, which alternates
         between stochastic gradient descent steps and projection steps,
diff --git a/src/python/nimbusml/linear_model/linearsvmbinaryclassifier.py b/src/python/nimbusml/linear_model/linearsvmbinaryclassifier.py
index 4f900aea..27511e27 100644
--- a/src/python/nimbusml/linear_model/linearsvmbinaryclassifier.py
+++ b/src/python/nimbusml/linear_model/linearsvmbinaryclassifier.py
@@ -29,12 +29,10 @@ class LinearSvmBinaryClassifier(
     .. remarks::
         Linear SVM implements an algorithm that finds a hyperplane in the
         feature space for binary classification, by solving an SVM problem.
-        For instance, with feature values $f_0, f_1,..., f_{D-1}$, the
-        prediction is given by determining what side of the hyperplane the
-        point falls into. That is the same as the sign of the feautures'
-        weighted sum, i.e. $\sum_{i = 0}^{D-1} \left(w_i * f_i \right) + b$,
-        where $w_0, w_1,..., w_{D-1}$ are the weights computed by the
-        algorithm, and *b* is the bias computed by the algorithm.
+        For instance, for a given feature vector, the prediction is given by
+        determining what side of the hyperplane the    point falls into. That is
+        the same as the sign of the feautures' weighted sum (the weights being
+        computed by the algorithm) plus the bias computed by the algorithm.
 
         This algorithm implemented is the PEGASOS method, which alternates
         between stochastic gradient descent steps and projection steps,