diff --git a/machine-learning/DataSources.ipynb b/machine-learning/DataSources.ipynb
new file mode 100644
index 0000000..7dd0a8b
--- /dev/null
+++ b/machine-learning/DataSources.ipynb
@@ -0,0 +1,253 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Loading data \n",
+ "\n",
+ "Getting data into the DataFrame is the most important step. The DataFrame itself supports [loading from a csv](https://docs.microsoft.com/en-us/dotnet/api/microsoft.data.analysis.dataframe.loadcsvfromstring?view=ml-dotnet-preview#microsoft-data-analysis-dataframe-loadcsvfromstring(system-string-system-char-system-boolean-system-string()-system-type()-system-int64-system-int32-system-boolean)). Not all data is already in a csv file. There is the option to convert from an IDataView into a DataFrame. ML.NET supports loading from a few different sources into an IDataView. See docs [here](https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/load-data-ml-net). \n",
+ "\n",
+ "If you run into issue, please file them in our [Github repo](https://github.com/dotnet/machinelearning/issues). If possible, please include the problem data set. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "dotnet_interactive": {
+ "language": "csharp"
+ },
+ "vscode": {
+ "languageId": "dotnet-interactive.csharp"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
Installed Packages- DataView.InteractiveExtension, 1.0.45
- Microsoft.Data.Analysis, 0.19.1
- Microsoft.ML, 1.7.1
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "Loading extensions from `DataView.InteractiveExtension.dll`"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "Loading extensions from `Microsoft.Data.Analysis.Interactive.dll`"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "Added support IDataView to kernel .NET."
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "// load extension to get data frame api, visualization and formatting\n",
+ "\n",
+ "#r \"nuget: Microsoft.Data.Analysis, 0.19.1\"\n",
+ "#r \"nuget: DataView.InteractiveExtension, 1.0.45\"\n",
+ "#r \"nuget: Microsoft.ML\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Directly from CSV\n",
+ "We can easily load our data directly from a CSV into the DataFrame. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "dotnet_interactive": {
+ "language": "csharp"
+ },
+ "vscode": {
+ "languageId": "dotnet-interactive.csharp"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "var csvFilePath = @\"data/usa_hockey.csv\";"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "dotnet_interactive": {
+ "language": "csharp"
+ },
+ "vscode": {
+ "languageId": "dotnet-interactive.csharp"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "index | Birthday | Nat | Height | Weight | DraftYear | OverallDraft | Hand | Last Name | First Name | Position | Team | GamesPlayed | Goals | Assists | Points | PIM | Shifts | TimeOnIce |
---|
0 | 88-16-04 | USA | 72 | 218 | 2006 | 7 | R | Okposo | Kyle | RW | BUF | 65 | 19 | 26 | 45 | 24 | 1443 | 73983 |
1 | 90-08-10 | USA | 76 | 210 | 2009 | 114 | L | Helgeson | Seth | D | N.J | 9 | 1 | 0 | 1 | 15 | 177 | 7273 |
2 | 96-26-11 | USA | 77 | 203 | 2015 | 37 | R | Carlo | Brandon | D | BOS | 82 | 6 | 10 | 16 | 59 | 2080 | 102414 |
3 | 90-16-11 | USA | 74 | 219 | <null> | <null> | L | Schaller | Tim | C | BOS | 59 | 7 | 7 | 14 | 23 | 1035 | 43436 |
4 | 92-20-03 | USA | 72 | 215 | 2010 | 37 | R | Faulk | Justin | D | CAR | 75 | 17 | 20 | 37 | 32 | 1987 | 104133 |
5 | 94-01-05 | USA | 74 | 205 | 2012 | 120 | L | Slavin | Jaccob | D | CAR | 82 | 5 | 29 | 34 | 12 | 2135 | 115316 |
6 | 90-20-06 | USA | 75 | 221 | 2008 | 128 | R | Pateryn | Greg | D | DAL/MTL | 36 | 1 | 8 | 9 | 10 | 720 | 33312 |
7 | 90-27-05 | USA | 74 | 196 | 2009 | 198 | R | Dowd | Nic | C | L.A | 70 | 6 | 16 | 22 | 25 | 1230 | 52314 |
8 | 90-16-07 | USA | 75 | 221 | <null> | <null> | L | Lashoff | Brian | D | DET | 5 | 0 | 0 | 0 | 0 | 93 | 3754 |
9 | 86-09-08 | USA | 71 | 197 | <null> | <null> | R | Cannone | Patrick | C | MIN | 3 | 0 | 0 | 0 | 0 | 35 | 1419 |
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "using Microsoft.Data.Analysis;\n",
+ "\n",
+ "var csvDataFrame = DataFrame.LoadCsv(csvFilePath);\n",
+ "csvDataFrame"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## ML.NET IDataView Loader\n",
+ "You may want to load from a different data source. ML.NET supports many different data souces, and we can convert an IDataView into a DataFrame. Find out more about IDataViews [here](https://github.com/dotnet/machinelearning/blob/main/docs/code/IDataViewDesignPrinciples.md). "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "dotnet_interactive": {
+ "language": "csharp"
+ },
+ "vscode": {
+ "languageId": "dotnet-interactive.csharp"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "using Microsoft.ML.Data; \n",
+ "\n",
+ "public class SalaryData\n",
+ "{\n",
+ " [LoadColumn(0)]\n",
+ " public float Salary { get; set; }\n",
+ "\n",
+ " [LoadColumn(1)]\n",
+ " public string Name { get; set; }\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### From File"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "dotnet_interactive": {
+ "language": "csharp"
+ },
+ "vscode": {
+ "languageId": "dotnet-interactive.csharp"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "using Microsoft.ML;\n",
+ "using Microsoft.ML.Data;\n",
+ "using System;\n",
+ "using System.Collections.Generic;\n",
+ "using System.Linq;\n",
+ "\n",
+ "//Create MLContext\n",
+ "MLContext mlContext = new MLContext();\n",
+ "\n",
+ "//Load Data\n",
+ "IDataView data = mlContext.Data.LoadFromTextFile(\"data/playerSalary.csv\", separatorChar: ',', hasHeader: true);\n",
+ "var df = data.ToDataFrame();\n",
+ "\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### From JSON"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "dotnet_interactive": {
+ "language": "csharp"
+ },
+ "vscode": {
+ "languageId": "dotnet-interactive.csharp"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "index | Salary | Name |
---|
0 | 3000000 | Adam Larsson |
1 | 1600000 | Andrej Sustr |
2 | 2200000 | Antoine Roussel |
3 | 950000 | Anton Rodin |
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "using Newtonsoft.Json;\n",
+ "using System.IO;\n",
+ "\n",
+ "// Load the json file into an ennumerable, then into the data view from the ennumerable. \n",
+ "var accounts = JsonConvert.DeserializeObject>(File.ReadAllText(@\"data\\playerSalary.json\"));\n",
+ "IDataView dataView = mlContext.Data.LoadFromEnumerable(accounts);\n",
+ "\n",
+ "// Convert to DataFrame\n",
+ "var jsonDataFrame = dataView.ToDataFrame(); \n",
+ "\n",
+ "jsonDataFrame"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".NET (C#)",
+ "language": "C#",
+ "name": ".net-csharp"
+ },
+ "language_info": {
+ "file_extension": ".cs",
+ "mimetype": "text/x-csharp",
+ "name": "C#",
+ "pygments_lexer": "csharp",
+ "version": "8.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/machine-learning/data/playerSalary.json b/machine-learning/data/playerSalary.json
new file mode 100644
index 0000000..876f978
--- /dev/null
+++ b/machine-learning/data/playerSalary.json
@@ -0,0 +1,18 @@
+[
+{
+ "Salary": 3000000,
+ "Name": "Adam Larsson"
+},
+{
+ "Salary": 1600000,
+ "Name": "Andrej Sustr"
+},
+{
+ "Salary": 2200000,
+ "Name": "Antoine Roussel"
+},
+{
+ "Salary": 950000,
+ "Name": "Anton Rodin"
+}
+]
\ No newline at end of file