diff --git a/example/jupyter/tutorial_dnn_iris.ipynb b/example/jupyter/tutorial_dnn_iris.ipynb new file mode 100644 index 0000000000..e8a6b3a2ec --- /dev/null +++ b/example/jupyter/tutorial_dnn_iris.ipynb @@ -0,0 +1,208 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Classify Iris Dataset Using DNNClassifer\n", + "\n", + "This tutorial demonstrates how to\n", + "1. train a DNNClassifer on iris dataset.\n", + "1. use trained DNNClassifer to predict iris class.\n", + "\n", + "## The Dataset\n", + "\n", + "The Iris data set contains four features and one label. The four features identify the botanical characteristics of individual Iris flowers. Each feature is stored as a single float number. The label indicates the class of individual Iris flowers. The label is stored as a integer and has possible value of 0, 1, 2.\n", + "\n", + "We have prepared the iris dataset in table `iris.train` and `iris.test`. We will be using them as training data and test data respectively.\n", + "\n", + "We can have a quick peek of the data by running the following standard SQL statements." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "+--------------+---------+------+-----+---------+-------+\n", + "| Field | Type | Null | Key | Default | Extra |\n", + "+--------------+---------+------+-----+---------+-------+\n", + "| sepal_length | float | YES | | None | |\n", + "| sepal_width | float | YES | | None | |\n", + "| petal_length | float | YES | | None | |\n", + "| petal_width | float | YES | | None | |\n", + "| class | int(11) | YES | | None | |\n", + "+--------------+---------+------+-----+---------+-------+" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%sqlflow\n", + "describe iris.train;" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "+--------------+-------------+--------------+-------------+-------+\n", + "| sepal_length | sepal_width | petal_length | petal_width | class |\n", + "+--------------+-------------+--------------+-------------+-------+\n", + "| 6.4 | 2.8 | 5.6 | 2.2 | 2 |\n", + "| 5.0 | 2.3 | 3.3 | 1.0 | 1 |\n", + "| 4.9 | 2.5 | 4.5 | 1.7 | 2 |\n", + "| 4.9 | 3.1 | 1.5 | 0.1 | 0 |\n", + "| 5.7 | 3.8 | 1.7 | 0.3 | 0 |\n", + "+--------------+-------------+--------------+-------------+-------+" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%sqlflow\n", + "select *\n", + "from iris.train\n", + "limit 5;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train\n", + "\n", + "Let's train a DNNClassifier, which has two hidden layers where each layer has ten hidden units. This can be done by specifying the training clause for SQLFlow's extended syntax.\n", + "\n", + "```\n", + "TRAIN DNNClassifier\n", + "WITH\n", + " model.n_classes = 3,\n", + " model.hidden_units = [10, 20]\n", + "```\n", + "\n", + "To specify the training data, we use standard SQL statements like `SELECT * FROM iris.train`.\n", + "\n", + "We explicit specify which column is used for features and which column is used for the label by writing\n", + "\n", + "```\n", + "COLUMN sepal_length, sepal_width, petal_length, petal_width\n", + "LABEL class\n", + "```\n", + "\n", + "At the end of the training process, we save the trained DNN model into table `sqlflow_models.my_dnn_model` by writing `INTO sqlflow_models.my_dnn_model`.\n", + "\n", + "Putting it all together, we have our first SQLFlow training statement." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Evaluation result: {'accuracy': 0.4347826, 'average_loss': 1.6101575, 'loss': 1.6101575, 'global_step': 100}\n", + "\n", + "Done training\n", + "\n" + ] + } + ], + "source": [ + "%%sqlflow\n", + "SELECT *\n", + "FROM iris.train\n", + "TRAIN DNNClassifier\n", + "WITH\n", + " model.n_classes = 3,\n", + " model.hidden_units = [10, 20],\n", + " train.epoch = 100\n", + "COLUMN sepal_length, sepal_width, petal_length, petal_width\n", + "LABEL class\n", + "INTO sqlflow_models.my_dnn_model;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Predict\n", + "\n", + "SQLFlow also supports prediction out-of-the-box.\n", + "\n", + "To specify the prediction data, we use standard SQL statements like `SELECT * FROM iris.test`.\n", + "\n", + "Say we want the model, previously stored at `sqlflow_models.my_dnn_model`, to read the prediction data and write the predicted result into table `iris.predict` column `class`. We can write the following SQLFlow prediction statement." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%sqlflow\n", + "SELECT *\n", + "FROM iris.test\n", + "predict iris.predict.class\n", + "USING sqlflow_models.my_dnn_model;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After the prediction, we can checkout the prediction result by" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%sqlflow\n", + "SELECT *\n", + "FROM iris.predict\n", + "LIMIT 5;" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}