sql-machine-learning · tonyyang-svail · Aug 30, 2019 · Aug 29, 2019 · Aug 29, 2019 · Aug 29, 2019
diff --git a/example/jupyter/tutorial_dnn_iris.ipynb b/example/jupyter/tutorial_dnn_iris.ipynb
@@ -0,0 +1,208 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Classify Iris Dataset Using DNNClassifer\n",
+    "\n",
+    "This tutorial demonstrates how to\n",
+    "1. train a DNNClassifer on iris dataset.\n",
+    "1. use trained DNNClassifer to predict iris class.\n",
+    "\n",
+    "## The Dataset\n",
+    "\n",
+    "The Iris data set contains four features and one label. The four features identify the botanical characteristics of individual Iris flowers. Each feature is stored as a single float number. The label indicates the class of individual Iris flowers. The label is stored as a integer and has possible value of 0, 1, 2.\n",
+    "\n",
+    "We have prepared the iris dataset in table `iris.train` and `iris.test`. We will be using them as training data and test data respectively.\n",
+    "\n",
+    "We can have a quick peek of the data by running the following standard SQL statements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+---------+------+-----+---------+-------+\n",
+       "|    Field     |   Type  | Null | Key | Default | Extra |\n",
+       "+--------------+---------+------+-----+---------+-------+\n",
+       "| sepal_length |  float  | YES  |     |   None  |       |\n",
+       "| sepal_width  |  float  | YES  |     |   None  |       |\n",
+       "| petal_length |  float  | YES  |     |   None  |       |\n",
+       "| petal_width  |  float  | YES  |     |   None  |       |\n",
+       "|    class     | int(11) | YES  |     |   None  |       |\n",
+       "+--------------+---------+------+-----+---------+-------+"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%sqlflow\n",
+    "describe iris.train;"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+-------------+--------------+-------------+-------+\n",
+       "| sepal_length | sepal_width | petal_length | petal_width | class |\n",
+       "+--------------+-------------+--------------+-------------+-------+\n",
+       "|     6.4      |     2.8     |     5.6      |     2.2     |   2   |\n",
+       "|     5.0      |     2.3     |     3.3      |     1.0     |   1   |\n",
+       "|     4.9      |     2.5     |     4.5      |     1.7     |   2   |\n",
+       "|     4.9      |     3.1     |     1.5      |     0.1     |   0   |\n",
+       "|     5.7      |     3.8     |     1.7      |     0.3     |   0   |\n",
+       "+--------------+-------------+--------------+-------------+-------+"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%sqlflow\n",
+    "select *\n",
+    "from iris.train\n",
+    "limit 5;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train\n",
+    "\n",
+    "Let's train a DNNClassifier, which has two hidden layers where each layer has ten hidden units. This can be done by specifying the training clause for SQLFlow's extended syntax.\n",
+    "\n",
+    "```\n",
+    "TRAIN DNNClassifier\n",
+    "WITH\n",
+    "  model.n_classes = 3,\n",
+    "  model.hidden_units = [10, 20]\n",
+    "```\n",
+    "\n",
+    "To specify the training data, we use standard SQL statements like `SELECT * FROM iris.train`.\n",
+    "\n",
+    "We explicit specify which column is used for features and which column is used for the label by writing\n",
+    "\n",
+    "```\n",
+    "COLUMN sepal_length, sepal_width, petal_length, petal_width\n",
+    "LABEL class\n",
+    "```\n",
+    "\n",
+    "At the end of the training process, we save the trained DNN model into table `sqlflow_models.my_dnn_model` by writing `INTO sqlflow_models.my_dnn_model`.\n",
+    "\n",
+    "Putting it all together, we have our first SQLFlow training statement."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Evaluation result: {'accuracy': 0.4347826, 'average_loss': 1.6101575, 'loss': 1.6101575, 'global_step': 100}\n",
+      "\n",
+      "Done training\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%sqlflow\n",
+    "SELECT *\n",
+    "FROM iris.train\n",
+    "TRAIN DNNClassifier\n",
+    "WITH\n",
+    "  model.n_classes = 3,\n",
+    "  model.hidden_units = [10, 20],\n",
+    "  train.epoch = 100\n",
+    "COLUMN sepal_length, sepal_width, petal_length, petal_width\n",
+    "LABEL class\n",
+    "INTO sqlflow_models.my_dnn_model;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Predict\n",
+    "\n",
+    "SQLFlow also supports prediction out-of-the-box.\n",
+    "\n",
+    "To specify the prediction data, we use standard SQL statements like `SELECT * FROM iris.test`.\n",
+    "\n",
+    "Say we want the model, previously stored at `sqlflow_models.my_dnn_model`, to read the prediction data and write the predicted result into table `iris.predict` column `class`. We can write the following SQLFlow prediction statement."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sqlflow\n",
+    "SELECT *\n",
+    "FROM iris.test\n",
+    "predict iris.predict.class\n",
+    "USING sqlflow_models.my_dnn_model;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After the prediction, we can checkout the prediction result by"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sqlflow\n",
+    "SELECT *\n",
+    "FROM iris.predict\n",
+    "LIMIT 5;"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}