Created samples for 'ProduceNgrams' and 'ProduceHashedNgrams' APIs. #3177

zeahmed · 2019-04-02T20:05:31Z

Related to #1209.

zeahmed · 2019-04-02T20:12:23Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+                foreach (var item in featureRow.Items())
+                    Console.Write($"{slots[item.Key]}  ");
+                Console.WriteLine();
+            }


This is the concern for me right now. There is no way to get this meta data through the Transformer or through the prediction engine. The only way is through IDataView obtained from .Transform call.

If you have to transform the data anyway, why not just print off the transformed IDV instead of off a prediction? You can limit it to one row with a TakeRows filter.

In reply to: 271476465 [](ancestors = 271476465)

sfilipi · 2019-04-03T15:58:42Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+                new TextData(){ Text = "The value at each position corresponds to," },
+                new TextData(){ Text = "the number of times Ngram occured in the data (Tf), or" },
+                new TextData(){ Text = "the inverse of the number of documents that contain the Ngram (Idf), or." },
+                new TextData(){ Text = "or compute both and multipy together (Tf-Idf)." },


love it! #Closed

sfilipi · 2019-04-03T15:59:36Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+            var textPipeline = mlContext.Transforms.Text.TokenizeIntoWords("Tokens", "Text")
+                .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))
+                .Append(mlContext.Transforms.Text.ProduceNgrams("NgramFeatures", "Tokens",
+                ngramLength: 3, useAllLengths: false, weighting: NgramExtractingEstimator.WeightingCriteria.Tf));


ngramLength: 3, useAllLengths: false, weighting: NgramExtractingEstimator.WeightingCriteria.Tf [](start = 16, length = 94)

one parameter in one line might be more presentable. #Resolved

sfilipi · 2019-04-03T15:59:51Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+            // This is acheived by calling 'TokenizeIntoWords' first followed by 'ProduceNgrams'.
+            // Please note that the length of the output feature vector depends on the Ngram settings.
+            var textPipeline = mlContext.Transforms.Text.TokenizeIntoWords("Tokens", "Text")
+                .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))


.Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens")) [](start = 14, length = 66)

add one line of comment on why this is here. #Resolved

sfilipi · 2019-04-03T16:01:15Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+            }
+
+            // Print the first 10 feature values.
+            Console.Write("Features: ");


Console.Write("Features: "); [](start = 11, length = 29)

i'd remove unnecessary printings

Actually, I'd like to keep it because its more clear to read each line with this prefix on console.

In reply to: 271816037 [](ancestors = 271816037)

sfilipi · 2019-04-03T16:02:28Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+            transformedDataView.Schema["NgramFeatures"].GetSlotNames(ref slotNames);
+            var NgramFeaturesColumn = transformedDataView.GetColumn<VBuffer<float>>(transformedDataView.Schema["NgramFeatures"]);
+            var slots = slotNames.GetValues();
+            Console.Write("Ngrams: ");


Console.Write("Ngrams: "); [](start = 10, length = 28)

i'd remove this too.

Actually, I'd like to keep it because its more clear to read each line with this prefix on console.

In reply to: 271816588 [](ancestors = 271816588)

sfilipi · 2019-04-03T16:03:16Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+            // Print the length of the feature vector.
+            Console.WriteLine($"Number of Features: {prediction.NgramFeatures.Length}");
+
+            // Preview of the produced .


// Preview of the produced . [](start = 11, length = 29)

comment about slot names #Resolved

sfilipi · 2019-04-03T16:04:27Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+            var prediction = predictionEngine.Predict(samples[0]);
+
+            // Print the length of the feature vector.
+            Console.WriteLine($"Number of Features: {prediction.NgramFeatures.Length}");


is this necessary? #Resolved

yes.

In reply to: 271817449 [](ancestors = 271817449)

sfilipi · 2019-04-03T16:08:24Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceHashedNgrams.cs

+            var prediction = predictionEngine.Predict(samples[0]);
+
+            // Print the length of the feature vector.
+            Console.WriteLine($"Number of Features: {prediction.NgramFeatures.Length}");


similar comment to the other file, is this needed? #Resolved

yes.

In reply to: 271819189 [](ancestors = 271819189)

codecov · 2019-04-03T19:09:16Z

Codecov Report

Merging #3177 into master will increase coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3177      +/-   ##
==========================================
+ Coverage   72.54%   72.58%   +0.04%     
==========================================
  Files         807      807              
  Lines      144774   144956     +182     
  Branches    16208    16212       +4     
==========================================
+ Hits       105022   105215     +193     
+ Misses      35338    35325      -13     
- Partials     4414     4416       +2

Flag	Coverage Δ
#Debug	`72.58% <ø> (+0.04%)`	⬆️
#production	`68.14% <ø> (+0.01%)`	⬆️
#test	`88.88% <ø> (+0.04%)`	⬆️

Impacted Files	Coverage Δ
src/Microsoft.ML.Transforms/Text/TextCatalog.cs	`41.66% <ø> (ø)`	⬆️
src/Microsoft.ML.DataView/KeyDataViewType.cs	`74.57% <0%> (-3.76%)`	⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs	`89.26% <0%> (-0.63%)`	⬇️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs	`73.7% <0%> (-0.34%)`	⬇️
test/Microsoft.ML.Tests/ImagesTests.cs	`98.69% <0%> (-0.13%)`	⬇️
...Microsoft.ML.Tests/Transformers/NormalizerTests.cs	`100% <0%> (ø)`	⬆️
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs	`44.87% <0%> (ø)`	⬆️
src/Microsoft.ML.Maml/MAML.cs	`26.21% <0%> (+1.45%)`	⬆️
...rosoft.ML.ImageAnalytics/VectorToImageTransform.cs	`76.77% <0%> (+4.53%)`	⬆️
... and 3 more

Ivanidzo4ka · 2019-04-03T19:38:29Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+                new TextData(){ Text = "Each position in the vector corresponds to a particular Ngram." },
+                new TextData(){ Text = "The value at each position corresponds to," },
+                new TextData(){ Text = "the number of times Ngram occured in the data (Tf), or" },
+                new TextData(){ Text = "the inverse of the number of documents that contain the Ngram (Idf), or." },


or. [](start = 109, length = 3)

omit this one. #Resolved

Ivanidzo4ka · 2019-04-03T19:40:54Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceHashedNgrams.cs

+                new TextData(){ Text = "This is an example to compute Ngrams using hashing." },
+                new TextData(){ Text = "Ngram is a sequence of 'N' consecutive words/tokens." },
+                new TextData(){ Text = "ML.NET's ProduceHashedNgrams API produces count of Ngrams and hashes it as an index into a vector of given bit length." },
+                new TextData(){ Text = "The hashing schem reduces the size of the output feature vector" },


schem [](start = 52, length = 5)

process? #Resolved

Ivanidzo4ka · 2019-04-03T19:47:19Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceHashedNgrams.cs

+                Console.Write($"{prediction.NgramFeatures[i]:F4}  ");
+
+            //  Expected output:
+            //   Number of Features: 256


// Number of Features: 256 [](start = 12, length = 28)

if you use maximumNumberOfInverts you can also show up slot names as in just ngrams. #Resolved

I tried that but its bit complex to represent it on console as many ngram can fall into one slot.

In reply to: 271903797 [](ancestors = 271903797)

Well you can control with that parameter how many you actually want to remember, right? So if you put 1, you will have only one of them

In reply to: 271929298 [](ancestors = 271929298,271903797)

Ivanidzo4ka

rogancarr · 2019-04-04T23:55:38Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceNgrams.cs

+            // Please note that the length of the output feature vector depends on the n-gram settings.
+            var textPipeline = mlContext.Transforms.Text.TokenizeIntoWords("Tokens", "Text")
+                // 'ProduceNgrams' takes key type as input. Converting the tokens into key type using 'MapValueToKey'.
+                .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))


.Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens")) [](start = 16, length = 64)

This seems like a holdover from the internal codebase. I wonder if we should consider doing a breaking change to move this keytype conversion into the ProduceNGrams operation. The question is what other use cases do we expect to see?

rogancarr

Approved with comments.

…otnet#3177)

Created samples for 'ProduceNgrams' and 'ProduceHashedNgrams' APIs.

80ee90a

zeahmed requested review from rogancarr, sfilipi, shmoradims and singlis April 2, 2019 20:05

zeahmed commented Apr 2, 2019

View reviewed changes

sfilipi mentioned this pull request Apr 2, 2019

API reference - Samples for Transforms #1209

Closed

sfilipi reviewed Apr 3, 2019

View reviewed changes

Addressed reviewers' comments.

fe65033

Ivanidzo4ka reviewed Apr 3, 2019

View reviewed changes

zeahmed added 4 commits April 3, 2019 13:57

Addressed reviewers' comments.

4ddaf5c

Addressed reviewers' comments.

4db1cd7

Addressed reviewers' comments.

12e6d9d

Addressed reviewers' comments.

91eed9c

Ivanidzo4ka approved these changes Apr 4, 2019

View reviewed changes

zeahmed added 2 commits April 4, 2019 10:31

Changed input/output classes to private.

c177c6b

Addressed reviewers' comments.

8b5001c

rogancarr reviewed Apr 4, 2019

View reviewed changes

rogancarr approved these changes Apr 5, 2019

View reviewed changes

zeahmed merged commit 854f154 into dotnet:master Apr 5, 2019

zeahmed added a commit to zeahmed/machinelearning that referenced this pull request Apr 8, 2019

Created samples for 'ProduceNgrams' and 'ProduceHashedNgrams' APIs. (d…

2efcc4e

…otnet#3177)

zeahmed mentioned this pull request Apr 8, 2019

Cherry pick for samples (Text) #3240

Closed

ghost locked as resolved and limited conversation to collaborators Mar 23, 2022

Created samples for 'ProduceNgrams' and 'ProduceHashedNgrams' APIs. #3177

Created samples for 'ProduceNgrams' and 'ProduceHashedNgrams' APIs. #3177

Uh oh!

Conversation

zeahmed commented Apr 2, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

sfilipi Apr 3, 2019 •

edited

Loading

sfilipi Apr 3, 2019 •

edited by zeahmed

Loading

sfilipi Apr 3, 2019 •

edited by zeahmed

Loading

sfilipi Apr 3, 2019 •

edited by zeahmed

Loading

sfilipi Apr 3, 2019 •

edited by zeahmed

Loading

sfilipi Apr 3, 2019 •

edited by zeahmed

Loading

sfilipi Apr 3, 2019 •

edited by zeahmed

Loading

sfilipi Apr 3, 2019 •

edited by zeahmed

Loading

codecov bot commented Apr 3, 2019 •

edited

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

rogancarr left a comment •

edited

Loading