Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit 725be2e

Browse files
authored
Merge pull request #2 from Microsoft/master
Merge
2 parents 45be3d7 + bec566c commit 725be2e

File tree

6 files changed

+11
-11
lines changed

6 files changed

+11
-11
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel and others. `nimbusml` was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance.
66

7-
This package enables training ML.NET pipelines or integrating ML.NET components directly into Scikit-Learn pipelines (it supports `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs).
7+
This package enables training ML.NET pipelines or integrating ML.NET components directly into [scikit-learn](https://scikit-learn.org/stable/) pipelines (it supports `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs).
88

99
Documentation can be found [here](https://docs.microsoft.com/en-us/NimbusML/overview) and additional notebook samples can be found [here](https://github.com/Microsoft/NimbusML-Samples).
1010

@@ -48,7 +48,7 @@ pipeline.fit(train_data)
4848
results = pipeline.predict(test_data)
4949
```
5050

51-
Instead of creating an `nimbusml` pipeline, you can also integrate components into Scikit-Learn pipelines:
51+
Instead of creating an `nimbusml` pipeline, you can also integrate components into scikit-learn pipelines:
5252

5353
```python
5454
from sklearn.pipeline import Pipeline

src/python/docs/sphinx/concepts/datasources.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ Output Data Types of Transforms
122122

123123
The return type of all of the transforms is a ``pandas.DataFrame``, when they
124124
are used inside a `sklearn.pipeline.Pipeline
125-
<http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
125+
<https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
126126
or when they are used individually.
127127

128128
However, when used inside a :py:class:`nimbusml.Pipeline`, the outputs are often stored in

src/python/docs/sphinx/concepts/experimentvspipeline.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ nimbusml.Pipeline() versus sklearn.Pipeline()
99
.. contents::
1010
:local:
1111

12-
This sections highlights the differences between using a `sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
12+
This sections highlights the differences between using a `sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
1313
and :py:class:`nimbusml.Pipeline` to compose a sequence of transformers and/or trainers.
1414

1515

1616
sklearn.Pipeline
1717
----------------
1818

1919
``nimbusml`` transforms and trainers are designed to be compatible with
20-
`sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_.
20+
`sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_.
2121
For fully optimized performance and added functionality, it is recommended to use
2222
:py:class:`nimbusml.Pipeline`. See below for more details.
2323

@@ -38,15 +38,15 @@ files that are too large to fit into memory, there is no easy way to train estim
3838
streaming the examples one at a time.
3939

4040
The :py:class:`nimbusml.Pipeline` module accepts inputs X and y similarly to
41-
`sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, but also
41+
`sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, but also
4242
inputs of type :py:class:`nimbusml.FileDataStream`, which is an optimized streaming file
4343
reader class. This is highly recommended for large datasets. See [Data Sources](datasources.md#data-from-a-filedatastream) for an
4444
example of using Pipeline with FileDataStream to read data in files.
4545

4646
Select which Columns to Transform
4747
"""""""""""""""""""""""""""""""""
4848

49-
When using `sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
49+
When using `sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
5050
the data columns of X and y (of type``numpy.array`` or ``scipy.sparse_csr``)
5151
are anonymous and cannot be referenced by name. Operations and transformations are
5252
therefore performed on all columns of the data.
@@ -66,7 +66,7 @@ Optimized Chaining of Trainers/Transforms
6666

6767
Using NimbusML, trainers and transforms within a :py:class:`nimbusml.Pipeline` will
6868
generally result in better performance compared to using them in a
69-
`sklearn.Pipeline <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_.
69+
`sklearn.Pipeline <https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_.
7070
Data copying is minimized when processing is limited to within the C# libraries, and if all
7171
components are in the same pipeline, data copies between C# and Python is reduced.
7272

src/python/docs/sphinx/concepts/types.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ dataframe and therefore the column_name can still be used to refer to the Vector
6161
efficiently without any conversion to a dataframe. Since the ``column_name`` of the vector is
6262
also preserved, it is possible to refer to it by downstream transforms by name. However, when
6363
transforms are used inside a `sklearn.pipeline.Pipeline()
64-
<http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, the output
64+
<https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_, the output
6565
of every transform is converted to a ``pandas.DataFrame`` first where the names of ``slots`` are
6666
preserved, but the ``column_name`` of the vector is dropped.
6767

src/python/docs/sphinx/metrics.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ This corresponds to evaltype='binary'.
5858
in `ML.NET <https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet>`_).
5959
This expression is asymptotically equivalent to the area under the curve
6060
which is what
61-
`scikit-learn <http://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html>`_ computation.
61+
`scikit-learn <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html>`_ computation.
6262
computes
6363
(see `auc <https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/metrics/ranking.py#L101>`_).
6464
That explains discrepencies on small test sets.

src/python/nimbusml/datasets/datasets.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def as_df(self):
7575

7676
class DataSetIris(DataSet):
7777
"""
78-
`Iris dataset <http://scikit-learn.org/stable/auto_examples/datasets
78+
`Iris dataset <https://scikit-learn.org/stable/auto_examples/datasets
7979
/plot_iris_dataset.html>`_ dataset.
8080
"""
8181

0 commit comments

Comments
 (0)