Skip to content

Document ways to infer IDataView schema when loading data #28466

@luisquintanilla

Description

@luisquintanilla

Add documentation to the following doc: https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/load-data-ml-net

Describe ways that you can load data into an IDataView without defining input and output schema classes. These include:

  • LoadFromTextFile method
  • LoadFromEnumerable Anonymous types

LoadFromTextFile method

Given a dataset similar to the following:

Iris-setosa,5.1,3.5,1.4,0.2
Iris-setosa,4.9,3.0,1.4,0.2
Iris-setosa,4.7,3.2,1.3,0.2

You can use the following code to load the data into an IDataView

open Microsoft.ML
open Microsoft.ML.Data

let ctx = new MLContext()

let options = new TextLoader.Options()
options.Separators <- [|','|]

let idv = ctx.Data.LoadFromTextFile("iris.data.txt", options)

There are a few assumptions made:

  • Your first column is your label / target variable
  • All your features are floats. If there are different types (i.e. a string), it gets converted to a float (NaN)

Once loaded, an IDataView is created with two columns:

  • Label
  • Features

LoadFromEnumerable Anonymous Types

When you have a collection of anonymous types, you can use the LoadFromEnumerable method and the schema is inferred. For example:

open Microsoft.ML

let ctx = new MLContext()

let reviews = 
    seq {
        {|SentimentText = "This is a great steak"; Label= true|}
        {|SentimentText = "Service was bad"; Label= false|}
        {|SentimentText = "I did not like the green eggs and ham"; Label= false|}
    }

let idvAnonIEnumerable = ctx.Data.LoadFromEnumerable(reviews)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions