-
Couldn't load subscription status.
- Fork 705
Add run SQLFlow with hive server tutorial #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| # How SQLFlow connects with Hive | ||
|
|
||
| This document is a tutorial on how SQLFlow connects Hive via [HiveServer2](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Overview). | ||
|
|
||
| ## Connect Existing Hive server | ||
|
|
||
| To connect an existing Hive server instance, we only need to configure a `datasource` string in the format of | ||
|
|
||
| ``` text | ||
| hive://user:password@ip:port/dbname[?auth=<auth_mechanism>&session.<cfg_key1>=<cfg_value1>...&session<cfg_keyN>=valueN] | ||
| ``` | ||
|
|
||
| In the above format, | ||
|
|
||
| - `user:password` is the username and password of hiveserver2. | ||
| - `ip:port` is the endpoint which the hiveserver2 instance listened on. | ||
| - `dbname` is the default database name. | ||
| - `auth_mechanism` is the authentication mechanism of hiveserver2, can be `NOSASL` for unsecurest transport or `PLAIN` for SASL transport. | ||
| - parameters with prefix `session.` is the session confiuration of Hive Thrift API, such as `session.mapreduce_job_queuename=mr` implies `mapreduce.job.queuename=mr`. | ||
|
|
||
| You can find more details at [gohive](https://sql-machine-learning.github.io/doc_index/gohive.html). | ||
|
|
||
| Using the `datasource` string, you can launch an all-in-one Docker container by running: | ||
|
|
||
| ``` bash | ||
| docker run --rm -p 8888:8888 sqlflow/sqlflow bash -c \ | ||
| "sqlflowserver --datasource='hive://root:root@localhost:10000/iris' & | ||
| SQLFLOW_SERVER=localhost:50051 jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --NotebookApp.token=''" | ||
| ``` | ||
|
|
||
| Then you can open a web browser and go to `localhost:8888`. There are many SQLFlow tutorials, e.g. `tutorial_dnn_iris.ipynb`. You can follow the tutorials and substitute the data for your own use. | ||
|
|
||
| ## Connect standalone Hive server for testing | ||
|
|
||
| We also pack a standalone Hive server Docker image for testing. | ||
|
|
||
| ### Connect Hive server with NOSASL Transport | ||
|
|
||
| Launch your standalone hive server Docker container by running: | ||
Yancey0623 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ``` bash | ||
| > docker run -d -p 8888:8888 --name=hive sqlflow/gohive:dev | ||
| ``` | ||
|
|
||
| This implies settings in `hive-site.xml`: | ||
|
|
||
| ``` text | ||
| hive.server2.authentication=NOSASL | ||
| ``` | ||
|
|
||
| Test SQLFlow by running the tutorials in Jupyter Notebook: | ||
|
|
||
| ``` bash | ||
| > docker run --rm --net=container:hive sqlflow/sqlflow \ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The sqlflow Docker container shared the network stack of hive container by |
||
| bash -c "sqlflowserver --datasource='hive://root:root@localhost:10000/' & | ||
| SQLFLOW_SERVER=localhost:50051 jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --NotebookApp.token=''" | ||
| ``` | ||
|
|
||
| ## Connect Hive Server with PLAIN SASL Transport | ||
|
|
||
| This section would use the [PAM](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PluggableAuthenticationModules(PAM)) authentication to do the demonstration. | ||
|
|
||
| Launch your standalone hive server Docker container with enable the PAM authentication: | ||
|
|
||
| ``` bash | ||
| > docker run -d -e WITH_HS2_PAM_AUTH=ON -p 8888:8888 --name=hive sqlflow/gohive:dev | ||
| ``` | ||
|
|
||
| This implies settings in `hive-site.xml`: | ||
|
|
||
| ``` text | ||
| hive.server2.authentication=PAM | ||
| hive.server2.authentication.pam.services=login,sshd | ||
| ``` | ||
|
|
||
| Test SQLFlow by running the tutorials in Jupyter Notebook: | ||
|
|
||
| ``` bash | ||
| > docker run --rm --net=container:hive sqlflow/sqlflow \ | ||
| bash -c "sqlflowserver --datasource='hive://sqlflow:sqlflow@localhost:10000/?auth=PLAIN' & | ||
| SQLFLOW_SERVER=localhost:50051 jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --NotebookApp.token=''" | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing dot:
session<cfg_keyN>=valueN]=>session.<cfg_keyN>=valueN]