Merge pull request #2 from Microsoft/master

ganik · web-flow · commit 725be2e50788 · 2018-11-23T10:16:05.000-08:00
Merge
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 
 ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel and others. `nimbusml` was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance. 
 
-This package enables training ML.NET pipelines or integrating ML.NET components directly into Scikit-Learn pipelines (it supports  `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs).
+This package enables training ML.NET pipelines or integrating ML.NET components directly into [scikit-learn](https://scikit-learn.org/stable/) pipelines (it supports  `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs).
 
 Documentation can be found [here](https://docs.microsoft.com/en-us/NimbusML/overview) and additional notebook samples can be found [here](https://github.com/Microsoft/NimbusML-Samples).
 
@@ -48,7 +48,7 @@ pipeline.fit(train_data)
 results = pipeline.predict(test_data)
 ```
 
-Instead of creating an `nimbusml` pipeline, you can also integrate components into Scikit-Learn pipelines:
+Instead of creating an `nimbusml` pipeline, you can also integrate components into scikit-learn pipelines:
 
 ```python
 from sklearn.pipeline import Pipeline
diff --git a/src/python/docs/sphinx/concepts/datasources.rst b/src/python/docs/sphinx/concepts/datasources.rst
@@ -122,7 +122,7 @@ Output Data Types of Transforms
 
 The return type of all of the transforms is a ``pandas.DataFrame``, when they
 are used inside a `sklearn.pipeline.Pipeline
-<http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
+<https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
 or when they are used individually.
 
 However, when used inside a :py:class:`nimbusml.Pipeline`, the outputs are often stored in
diff --git a/src/python/docs/sphinx/concepts/experimentvspipeline.rst b/src/python/docs/sphinx/concepts/experimentvspipeline.rst
@@ -9,15 +9,15 @@ nimbusml.Pipeline() versus sklearn.Pipeline()
 .. contents::
     :local:
 
-This sections highlights the differences between using a `sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_ 
+This sections highlights the differences between using a `sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_ 
 and :py:class:`nimbusml.Pipeline` to compose a sequence of transformers and/or trainers.
 
  
 sklearn.Pipeline
 ----------------
 
 ``nimbusml`` transforms and trainers are designed to be compatible with
-`sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_. 
+`sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_. 
 For fully optimized performance and added functionality, it is recommended to use
 :py:class:`nimbusml.Pipeline`. See below for more details.
 
@@ -38,15 +38,15 @@ files that are too large to fit into memory, there is no easy way to train estim
 streaming the examples one at a time.
 
 The :py:class:`nimbusml.Pipeline` module accepts inputs X and y similarly to
-`sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, but also
+`sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, but also
 inputs of type :py:class:`nimbusml.FileDataStream`, which is an optimized streaming file
 reader class. This is highly recommended for large datasets. See [Data Sources](datasources.md#data-from-a-filedatastream) for an
 example of using Pipeline with FileDataStream to read data in files.
 
 Select which Columns to Transform
 """""""""""""""""""""""""""""""""
 
-When using `sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
+When using `sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
 the data columns of X and y (of type``numpy.array`` or ``scipy.sparse_csr``)
 are anonymous and cannot be referenced by name. Operations and transformations are
 therefore performed on all columns of the data.
@@ -66,7 +66,7 @@ Optimized Chaining of Trainers/Transforms
 
 Using NimbusML, trainers and transforms within a :py:class:`nimbusml.Pipeline` will
 generally result in better performance compared to using them in a
-`sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_.
+`sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_.
 Data copying is minimized when processing is limited to within the C# libraries, and if all
 components are in the same pipeline, data copies between C# and Python is reduced.
 
diff --git a/src/python/docs/sphinx/concepts/types.rst b/src/python/docs/sphinx/concepts/types.rst
@@ -61,7 +61,7 @@ dataframe and therefore the column_name can still be used to refer to the Vector
     efficiently without any conversion to a dataframe. Since the ``column_name`` of the vector is
     also preserved, it is possible to refer to it by downstream transforms by name. However, when
     transforms are used inside a `sklearn.pipeline.Pipeline()
-    <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, the output
+    <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, the output
     of every transform is converted to a ``pandas.DataFrame`` first where the names of ``slots`` are
     preserved, but the ``column_name`` of the vector is dropped.
 
diff --git a/src/python/docs/sphinx/metrics.rst b/src/python/docs/sphinx/metrics.rst
@@ -58,7 +58,7 @@ This corresponds to evaltype='binary'.
     in `ML.NET <https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet>`_).
     This expression is asymptotically equivalent to the area under the curve
     which is what
-    `scikit-learn <http://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html>`_ computation.
+    `scikit-learn <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html>`_ computation.
     computes
     (see `auc <https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/metrics/ranking.py#L101>`_).
     That explains discrepencies on small test sets.
diff --git a/src/python/nimbusml/datasets/datasets.py b/src/python/nimbusml/datasets/datasets.py
@@ -75,7 +75,7 @@ def as_df(self):
 
 class DataSetIris(DataSet):
     """
-    `Iris dataset <http://scikit-learn.org/stable/auto_examples/datasets
+    `Iris dataset <https://scikit-learn.org/stable/auto_examples/datasets
     /plot_iris_dataset.html>`_ dataset.
     """