From 4f241e91a8cb8d32c5c6c420f34cb0febd7026c4 Mon Sep 17 00:00:00 2001 From: Gal Oshri Date: Mon, 4 Jun 2018 17:42:18 -0700 Subject: [PATCH 1/3] Add release notes for ML.NET 0.2 --- docs/release-notes/0.2/release-0.2.md | 57 +++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 docs/release-notes/0.2/release-0.2.md diff --git a/docs/release-notes/0.2/release-0.2.md b/docs/release-notes/0.2/release-0.2.md new file mode 100644 index 0000000000..cc81e20924 --- /dev/null +++ b/docs/release-notes/0.2/release-0.2.md @@ -0,0 +1,57 @@ +# ML.NET 0.2 Release Notes + +We would like to thank the community for the engagement so far and helping us shape ML.NET. + +Today we are releasing ML.NET 0.2. This release focuses on addressing questions/issues, adding clustering to the list of supported machine learning tasks, enabling using data from memory to train models, easier model validation, and more. + +### Installation + +Supported platforms: Windows, MacOS, Linux (see [supported OS versions of .NET Core 2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md) for more detail) + +You can install ML.NET NuGet from the CLI using: +``` +dotnet add package Microsoft.ML +``` + +From package manager: +``` +Install-Package Microsoft.ML +``` + +### Release Notes + +Below are some of the highlights from this release. + +#### New machine learning tasks: clustering + +Clustering is an unsupervised learning task that groups sets of items based on their features. It identifies which items are more similar to each other than other items. This might be useful in scenarios such as organizing news articles into groups based on their topics, segmenting users based on their shopping habits, and grouping viewers based on their taste in movies. + +ML.NET 0.2 exposes `KMeansPlusPlusClusterer` which implments K-Means++ clustering. [This test](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs) shows how to use it. + +#### Train using data objects in addition to loading data from a file: `CollectionDataSource` + +ML.NET 0.1 enabled loading data from a delimited text file. `CollectionDataSource` in ML.NET 0.2 adds the ability to use a collection of objects as the input to a `LearningPipeline`. See sample usage [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L133). + +#### Easier model validation with cross-validation and train-test + +[Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is an approach to validating how well your model statistically performs. It does not require a separate test dataset, but rather uses your training data to test your model (it partitions the data so different data is used for training and testing, and it does this multiple times). [Here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L51) is an example for doing cross-validation. + +Train-test is a shortcut to testing your model on a separate dataset. See example usage [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L36). + +Note that the `LearningPipeline` is prepared the same way in both cases. + +#### Speed improvement for predictions + +By not creating a parallel cursor for dataviews that only have one element, we get a significant speed-up for predictions (see [#179](https://github.com/dotnet/machinelearning/issues/179) for a few measurements). + +#### Added daily NuGet builds of the project + +Daily NuGet builds of ML.NET are now available [here](https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML). + +#### Additional issues closed in this milestone + +[Here](https://github.com/dotnet/machinelearning/milestone/1?closed=1) is the list of issues closed as part of this milestone. + +### Acknowledgements + +Shoutout to tincann, rantri, yamachu, pkulikov, Sorrien, v-tsymbalistyi, Ky7m, forki, jessebenson, mfaticaearnin, and the ML.NET team for their contributions as part of this release! From 87b20ce3d163a5c684fa0185013687819273dbb6 Mon Sep 17 00:00:00 2001 From: Gal Oshri Date: Mon, 4 Jun 2018 18:19:32 -0700 Subject: [PATCH 2/3] Adding release note about TextLoader changes and additional issue/PR references --- docs/release-notes/0.2/release-0.2.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/release-notes/0.2/release-0.2.md b/docs/release-notes/0.2/release-0.2.md index cc81e20924..afea58c190 100644 --- a/docs/release-notes/0.2/release-0.2.md +++ b/docs/release-notes/0.2/release-0.2.md @@ -26,24 +26,28 @@ Below are some of the highlights from this release. Clustering is an unsupervised learning task that groups sets of items based on their features. It identifies which items are more similar to each other than other items. This might be useful in scenarios such as organizing news articles into groups based on their topics, segmenting users based on their shopping habits, and grouping viewers based on their taste in movies. -ML.NET 0.2 exposes `KMeansPlusPlusClusterer` which implments K-Means++ clustering. [This test](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs) shows how to use it. +ML.NET 0.2 exposes `KMeansPlusPlusClusterer` which implments K-Means++ clustering. [This test](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs) shows how to use it (from [#222](https://github.com/dotnet/machinelearning/pull/222)). #### Train using data objects in addition to loading data from a file: `CollectionDataSource` -ML.NET 0.1 enabled loading data from a delimited text file. `CollectionDataSource` in ML.NET 0.2 adds the ability to use a collection of objects as the input to a `LearningPipeline`. See sample usage [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L133). +ML.NET 0.1 enabled loading data from a delimited text file. `CollectionDataSource` in ML.NET 0.2 adds the ability to use a collection of objects as the input to a `LearningPipeline`. See sample usage [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L133) (from [#106](https://github.com/dotnet/machinelearning/pull/106)). #### Easier model validation with cross-validation and train-test -[Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is an approach to validating how well your model statistically performs. It does not require a separate test dataset, but rather uses your training data to test your model (it partitions the data so different data is used for training and testing, and it does this multiple times). [Here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L51) is an example for doing cross-validation. +[Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is an approach to validating how well your model statistically performs. It does not require a separate test dataset, but rather uses your training data to test your model (it partitions the data so different data is used for training and testing, and it does this multiple times). [Here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L51) is an example for doing cross-validation (from [#212](https://github.com/dotnet/machinelearning/pull/212)). Train-test is a shortcut to testing your model on a separate dataset. See example usage [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L36). -Note that the `LearningPipeline` is prepared the same way in both cases. +Note that the `LearningPipeline` is prepared the same way in both cases. #### Speed improvement for predictions By not creating a parallel cursor for dataviews that only have one element, we get a significant speed-up for predictions (see [#179](https://github.com/dotnet/machinelearning/issues/179) for a few measurements). +#### Updated `TextLoader` API + +The `TextLoader` API is now code generated and was updated to take explicit declarations for the columns in the data, which is required in some scenarios. See [#142](https://github.com/dotnet/machinelearning/pull/142). + #### Added daily NuGet builds of the project Daily NuGet builds of ML.NET are now available [here](https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML). From f026db2cdba1858b0e8bea2ddf2a4092a61bd708 Mon Sep 17 00:00:00 2001 From: Gal Oshri Date: Mon, 4 Jun 2018 21:43:59 -0700 Subject: [PATCH 3/3] Addressing comments: fixing typos, changing formatting, and adding references --- docs/release-notes/0.2/release-0.2.md | 108 +++++++++++++++++--------- 1 file changed, 71 insertions(+), 37 deletions(-) diff --git a/docs/release-notes/0.2/release-0.2.md b/docs/release-notes/0.2/release-0.2.md index afea58c190..b31fa9a883 100644 --- a/docs/release-notes/0.2/release-0.2.md +++ b/docs/release-notes/0.2/release-0.2.md @@ -1,12 +1,19 @@ # ML.NET 0.2 Release Notes -We would like to thank the community for the engagement so far and helping us shape ML.NET. +We would like to thank the community for the engagement so far and helping us +shape ML.NET. -Today we are releasing ML.NET 0.2. This release focuses on addressing questions/issues, adding clustering to the list of supported machine learning tasks, enabling using data from memory to train models, easier model validation, and more. +Today we are releasing ML.NET 0.2. This release focuses on addressing +questions/issues, adding clustering to the list of supported machine learning +tasks, enabling using data from memory to train models, easier model +validation, and more. ### Installation -Supported platforms: Windows, MacOS, Linux (see [supported OS versions of .NET Core 2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md) for more detail) +ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET +Core +2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md) +for more details. You can install ML.NET NuGet from the CLI using: ``` @@ -22,40 +29,67 @@ Install-Package Microsoft.ML Below are some of the highlights from this release. -#### New machine learning tasks: clustering - -Clustering is an unsupervised learning task that groups sets of items based on their features. It identifies which items are more similar to each other than other items. This might be useful in scenarios such as organizing news articles into groups based on their topics, segmenting users based on their shopping habits, and grouping viewers based on their taste in movies. - -ML.NET 0.2 exposes `KMeansPlusPlusClusterer` which implments K-Means++ clustering. [This test](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs) shows how to use it (from [#222](https://github.com/dotnet/machinelearning/pull/222)). - -#### Train using data objects in addition to loading data from a file: `CollectionDataSource` - -ML.NET 0.1 enabled loading data from a delimited text file. `CollectionDataSource` in ML.NET 0.2 adds the ability to use a collection of objects as the input to a `LearningPipeline`. See sample usage [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L133) (from [#106](https://github.com/dotnet/machinelearning/pull/106)). - -#### Easier model validation with cross-validation and train-test - -[Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is an approach to validating how well your model statistically performs. It does not require a separate test dataset, but rather uses your training data to test your model (it partitions the data so different data is used for training and testing, and it does this multiple times). [Here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L51) is an example for doing cross-validation (from [#212](https://github.com/dotnet/machinelearning/pull/212)). - -Train-test is a shortcut to testing your model on a separate dataset. See example usage [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L36). - -Note that the `LearningPipeline` is prepared the same way in both cases. - -#### Speed improvement for predictions - -By not creating a parallel cursor for dataviews that only have one element, we get a significant speed-up for predictions (see [#179](https://github.com/dotnet/machinelearning/issues/179) for a few measurements). - -#### Updated `TextLoader` API - -The `TextLoader` API is now code generated and was updated to take explicit declarations for the columns in the data, which is required in some scenarios. See [#142](https://github.com/dotnet/machinelearning/pull/142). - -#### Added daily NuGet builds of the project - -Daily NuGet builds of ML.NET are now available [here](https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML). - -#### Additional issues closed in this milestone - -[Here](https://github.com/dotnet/machinelearning/milestone/1?closed=1) is the list of issues closed as part of this milestone. +* Added clustering to the list of supported machine learning tasks + + * Clustering is an unsupervised learning task that groups sets of items + based on their features. It identifies which items are more similar to + each other than other items. This might be useful in scenarios such as + organizing news articles into groups based on their topics, segmenting + users based on their shopping habits, and grouping viewers based on + their taste in movies. + + * ML.NET 0.2 exposes `KMeansPlusPlusClusterer` which implements [K-Means++ + clustering](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf) + with [Yinyang K-means + acceleration](https://www.microsoft.com/en-us/research/publication/yinyang-k-means-a-drop-in-replacement-of-the-classic-k-means-with-consistent-speedup/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2Fdefault.aspx%3Fid%3D252149). + [This + test](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs) + shows how to use it (from + [#222](https://github.com/dotnet/machinelearning/pull/222)). + +* Train using data objects in addition to loading data from a file using + `CollectionDataSource`. ML.NET 0.1 enabled loading data from a delimited + text file. `CollectionDataSource` in ML.NET 0.2 adds the ability to use a + collection of objects as the input to a `LearningPipeline`. See sample usage + [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L133) + (from [#106](https://github.com/dotnet/machinelearning/pull/106)). + +* Easier model validation with cross-validation and train-test + + * [Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) + is an approach to validating how well your model statistically performs. + It does not require a separate test dataset, but rather uses your + training data to test your model (it partitions the data so different + data is used for training and testing, and it does this multiple times). + [Here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L51) + is an example for doing cross-validation (from + [#212](https://github.com/dotnet/machinelearning/pull/212)). + + * Train-test is a shortcut to testing your model on a separate dataset. + See example usage + [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L36). + + * Note that the `LearningPipeline` is prepared the same way in both cases. + +* Speed improvement for predictions: by not creating a parallel cursor for + dataviews that only have one element, we get a significant speed-up for + predictions (see + [#179](https://github.com/dotnet/machinelearning/issues/179) for a few + measurements). + +* Updated `TextLoader` API: the `TextLoader` API is now code generated and was + updated to take explicit declarations for the columns in the data, which is + required in some scenarios. See + [#142](https://github.com/dotnet/machinelearning/pull/142). + +* Added daily NuGet builds of the project: daily NuGet builds of ML.NET are + now available + [here](https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML). + +Additional issues closed in this milestone can be found [here](https://github.com/dotnet/machinelearning/milestone/1?closed=1). ### Acknowledgements -Shoutout to tincann, rantri, yamachu, pkulikov, Sorrien, v-tsymbalistyi, Ky7m, forki, jessebenson, mfaticaearnin, and the ML.NET team for their contributions as part of this release! +Shoutout to tincann, rantri, yamachu, pkulikov, Sorrien, v-tsymbalistyi, Ky7m, +forki, jessebenson, mfaticaearnin, and the ML.NET team for their contributions +as part of this release!