-
Notifications
You must be signed in to change notification settings - Fork 0
rewrite snowflake quickstart guide #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
0cd4f29
rewrite snowflake quickstart guide
HarshCasper f4e707c
fix build
HarshCasper 4d3e5c1
small edit
HarshCasper 102c05f
make another change
HarshCasper 2e714cb
reform the quickstart
HarshCasper 97b41d9
switch to stack way of starting emulator
HarshCasper c95b6ee
fix starting line
HarshCasper File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,22 +9,23 @@ description: Get started with LocalStack for Snowflake in a few simple steps | |
|
||
## Introduction | ||
|
||
This guide explains how to set up the Snowflake emulator and develop a Python program using the Snowflake Connector for Python (`snowflake-connector-python`) to interact with emulated Snowflake running on your local machine. | ||
This guide explains how to set up the Snowflake emulator and use Snowflake CLI to interact with Snowflake resources running on your local machine. You'll learn how to create a Snowflake database, schema, and table, upload data to a stage, and load data into the table. This quickstart is designed to help you get familiar with the Snowflake emulator and its capabilities. | ||
|
||
## Prerequisites | ||
|
||
- [`localstack` CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) | ||
- [LocalStack for Snowflake]({{< ref "installation" >}}) | ||
- Python 3.10 or later | ||
- [`snowflake-connector-python` library](https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-install) | ||
- [`localstack` CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) | ||
- [Snowflake CLI]({{< ref "user-guide/integrations/snow-cli" >}}) | ||
|
||
LocalStack for Snowflake works with popular Snowflake integrations to run your SQL queries. This guide uses the [Snowflake CLI]({{< ref "user-guide/integrations/snow-cli" >}}), but you can also use [SnowSQL]({{< ref "user-guide/integrations/snowsql" >}}), [DBeaver]({{< ref "user-guide/integrations/dbeaver" >}}) or the [LocalStack Web Application]({{< ref "user-guide/user-interface" >}}) for this purpose. | ||
|
||
## Instructions | ||
|
||
Before you begin, pull the Snowflake emulator image (`localstack/snowflake`) and start the container: | ||
|
||
{{< command >}} | ||
$ export LOCALSTACK_AUTH_TOKEN=<your_auth_token> | ||
$ IMAGE_NAME=localstack/snowflake:latest localstack start | ||
$ localstack start --stack snowflake | ||
{{< / command >}} | ||
|
||
Check the emulator's availability by running: | ||
|
@@ -36,89 +37,193 @@ $ curl -d '{}' snowflake.localhost.localstack.cloud:4566/session | |
</disable-copy> | ||
{{< / command >}} | ||
|
||
### Connect to the Snowflake emulator | ||
In this quickstart, we'll create a student records database that demonstrates how to: | ||
|
||
- Create databases, schemas, and tables | ||
- Create stages and upload data using the PUT command | ||
- Load data from CSV files into tables | ||
- Query your data | ||
|
||
Create a new Python file named `main.py` and use the following code to connect to the Snowflake emulator: | ||
### Create database, schema & table | ||
|
||
```python | ||
import snowflake.connector as sf | ||
Create the Snowflake database named `STUDENT_RECORDS_DEMO` and use it: | ||
|
||
sf_conn_obj = sf.connect( | ||
user="test", | ||
password="test", | ||
account="test", | ||
database="test", | ||
host="snowflake.localhost.localstack.cloud", | ||
) | ||
```sql | ||
CREATE DATABASE IF NOT EXISTS STUDENT_RECORDS_DEMO; | ||
USE DATABASE STUDENT_RECORDS_DEMO; | ||
``` | ||
|
||
Specify the `host` parameter as `snowflake.localhost.localstack.cloud` and the other parameters as `test` to avoid connecting to the real Snowflake instance. | ||
The output should be: | ||
|
||
### Create and execute a query | ||
```bash | ||
+-----------------------------------------------------+ | ||
| status | | ||
|-----------------------------------------------------| | ||
| Database STUDENT_RECORDS_DEMO successfully created. | | ||
+-----------------------------------------------------+ | ||
``` | ||
|
||
Extend the Python program to insert rows from a list object into the emulated Snowflake table. Create a cursor object and execute the query: | ||
Create a Snowflake schema named `PUBLIC` and use it: | ||
|
||
```python | ||
print("1. Insert lot of rows from a list object to Snowflake table") | ||
print("2. Creating a cursor object") | ||
sf_cur_obj = sf_conn_obj.cursor() | ||
```sql | ||
CREATE SCHEMA IF NOT EXISTS PUBLIC; | ||
USE SCHEMA PUBLIC; | ||
``` | ||
|
||
print("3. Executing a query on cursor object") | ||
try: | ||
sf_cur_obj.execute( | ||
"create or replace table " | ||
"ability(name string, skill string )") | ||
The output should be: | ||
|
||
rows_to_insert = [('John', 'SQL'), ('Alex', 'Java'), ('Pete', 'Snowflake')] | ||
|
||
sf_cur_obj.executemany( | ||
" insert into ability (name, skill) values (%s,%s) " ,rows_to_insert) | ||
```bash | ||
+---------------------------------------------+ | ||
| result | | ||
|---------------------------------------------| | ||
| public already exists, statement succeeded. | | ||
+---------------------------------------------+ | ||
``` | ||
|
||
sf_cur_obj.execute("select name, skill from ability") | ||
Last, create the table `STUDENT_DATA` in the database: | ||
|
||
```sql | ||
CREATE OR REPLACE TABLE STUDENT_DATA ( | ||
student_id VARCHAR(50), | ||
first_name VARCHAR(100), | ||
last_name VARCHAR(100), | ||
email VARCHAR(200), | ||
enrollment_date DATE, | ||
gpa FLOAT, | ||
major VARCHAR(100) | ||
); | ||
``` | ||
|
||
print("4. Fetching the results") | ||
result = sf_cur_obj.fetchall() | ||
print("Total # of rows :" , len(result)) | ||
print("Row-1 =>",result[0]) | ||
print("Row-2 =>",result[1]) | ||
finally: | ||
sf_cur_obj.close() | ||
The output should be: | ||
|
||
```bash | ||
+------------------------------------------+ | ||
| status | | ||
|------------------------------------------| | ||
| Table STUDENT_DATA successfully created. | | ||
+------------------------------------------+ | ||
``` | ||
|
||
This program creates a table named `ability`, inserts rows, and fetches the results. | ||
### Create file format & stage | ||
|
||
### Run the Python program | ||
Now, create a file format for CSV files: | ||
|
||
Execute the Python program with: | ||
```sql | ||
CREATE OR REPLACE FILE FORMAT csv_format | ||
TYPE = CSV | ||
FIELD_DELIMITER = ',' | ||
SKIP_HEADER = 1 | ||
NULL_IF = ('NULL', 'null') | ||
EMPTY_FIELD_AS_NULL = TRUE; | ||
``` | ||
|
||
{{< command >}} | ||
$ python main.py | ||
{{< / command >}} | ||
The output should be: | ||
|
||
```bash | ||
+----------------------------------------------+ | ||
| status | | ||
|----------------------------------------------| | ||
| File format CSV_FORMAT successfully created. | | ||
+----------------------------------------------+ | ||
``` | ||
|
||
Create a stage for uploading files: | ||
|
||
```sql | ||
CREATE OR REPLACE STAGE student_data_stage | ||
FILE_FORMAT = csv_format; | ||
``` | ||
|
||
The output should be: | ||
|
||
```bash | ||
+-----------------------------------------------------+ | ||
| ?COLUMN? | | ||
|-----------------------------------------------------| | ||
| Stage area STUDENT_DATA_STAGE successfully created. | | ||
+-----------------------------------------------------+ | ||
``` | ||
|
||
### Upload and load sample data | ||
|
||
Create a new file named `student_data.csv` with sample student records: | ||
|
||
```csv | ||
student_id,first_name,last_name,email,enrollment_date,gpa,major | ||
S001,John,Smith,[email protected],2023-08-15,3.75,Computer Science | ||
S002,Alice,Johnson,[email protected],2023-08-15,3.92,Mathematics | ||
S003,Bob,Williams,[email protected],2022-08-15,3.45,Engineering | ||
S004,Carol,Brown,[email protected],2024-01-10,3.88,Physics | ||
S005,David,Davis,[email protected],2023-08-15,2.95,Biology | ||
``` | ||
|
||
Upload the CSV file to the stage using the PUT command: | ||
|
||
```sql | ||
PUT file://student_data.csv @student_data_stage AUTO_COMPRESS=TRUE; | ||
``` | ||
|
||
{{< alert title="Note" >}} | ||
Adjust the file path to the location of your `student_data.csv` file. | ||
{{< /alert >}} | ||
|
||
The output should show the file upload status: | ||
|
||
```bash | ||
source |target |source_size|target_size|source_compression|target_compression|status |message| | ||
----------------+-------------------+-----------+-----------+------------------+------------------+--------+-------+ | ||
student_data.csv|student_data.csv.gz| 425| 262|NONE |GZIP |UPLOADED| | | ||
``` | ||
|
||
Now load the data from the stage into the table: | ||
|
||
```sql | ||
COPY INTO STUDENT_DATA | ||
FROM @student_data_stage | ||
ON_ERROR = 'CONTINUE'; | ||
``` | ||
|
||
### Verify data loading | ||
|
||
```sql | ||
USE DATABASE STUDENT_RECORDS_DEMO; | ||
USE SCHEMA PUBLIC; | ||
|
||
SELECT COUNT(*) as total_students FROM STUDENT_DATA; | ||
``` | ||
|
||
The output should be: | ||
|
||
```bash | ||
Insert lot of rows from a list object to Snowflake table | ||
1. Insert lot of rows from a list object to Snowflake table | ||
2. Creating a cursor object | ||
3. Executing a query on cursor object | ||
4. Fetching the results | ||
Total # of rows : 3 | ||
Row-1 => ('John', 'SQL') | ||
Row-2 => ('Alex', 'Java') | ||
+----------------+ | ||
| TOTAL_STUDENTS | | ||
|----------------| | ||
| 5 | | ||
+----------------+ | ||
``` | ||
|
||
Similarly, you can query the student details based on their GPA: | ||
|
||
```sql | ||
SELECT first_name, last_name, major, gpa | ||
FROM STUDENT_DATA | ||
WHERE gpa >= 3.8 | ||
ORDER BY gpa DESC; | ||
``` | ||
|
||
Verify the results by navigating to the LocalStack logs: | ||
The output should be: | ||
|
||
```bash | ||
2024-02-22T06:03:13.627 INFO --- [ asgi_gw_0] localstack.request.http : POST /session/v1/login-request => 200 | ||
2024-02-22T06:03:16.122 WARN --- [ asgi_gw_0] l.packages.core : postgresql will be installed as an OS package, even though install target is _not_ set to be static. | ||
2024-02-22T06:03:45.917 INFO --- [ asgi_gw_0] localstack.request.http : POST /queries/v1/query-request => 200 | ||
2024-02-22T06:03:46.016 INFO --- [ asgi_gw_1] localstack.request.http : POST /queries/v1/query-request => 200 | ||
2024-02-22T06:03:49.361 INFO --- [ asgi_gw_0] localstack.request.http : POST /queries/v1/query-request => 200 | ||
2024-02-22T06:03:49.412 INFO --- [ asgi_gw_1] localstack.request.http : POST /session => 200 | ||
FIRST_NAME|LAST_NAME|MAJOR |GPA | | ||
----------+---------+-----------+----+ | ||
Alice |Johnson |Mathematics|3.92| | ||
Carol |Brown |Physics |3.88| | ||
``` | ||
|
||
Optionally, you can also query your Snowflake resources & data using the LocalStack Web Application, that provides a **Worksheet** tab to run your SQL queries. | ||
|
||
<img src="snowflake-web-ui.png" alt="Running SQL queries using LocalStack Web Application" width="900"/> | ||
|
||
### Destroy the local infrastructure | ||
|
||
To stop LocalStack and remove locally created resources, use: | ||
|
@@ -127,9 +232,17 @@ To stop LocalStack and remove locally created resources, use: | |
$ localstack stop | ||
{{< / command >}} | ||
|
||
LocalStack is ephemeral and doesn't persist data across restarts. It runs inside a Docker container, and once it’s stopped, all locally created resources are automatically removed. In a future release of the Snowflake emulator, we will provide proper persistence and integration with our [Cloud Pods](https://docs.localstack.cloud/user-guide/state-management/cloud-pods/) feature as well. | ||
LocalStack is ephemeral and doesn't persist data across restarts. It runs inside a Docker container, and once it's stopped, all locally created resources are automatically removed. To persist the state of your LocalStack for Snowflake instance, please check out our guide on [State Management]({{< ref "user-guide/state-management" >}}). | ||
|
||
## Next Steps | ||
|
||
Now that you've completed the quickstart, here are some additional features you can explore: | ||
|
||
- **Load data from cloud storage**: You can load data through our [Storage Integrations]({{< ref "user-guide/storage-integrations" >}}) (currently supporting AWS S3) or using a script (see [Snowflake Drivers]({{< ref "user-guide/snowflake-drivers" >}})) | ||
- **Automate data ingestion**: You can configure [Snowpipe]({{< ref "user-guide/snowpipe" >}}) for automated data ingestion from external sources | ||
- **Use your favorite tools**: You can continue to work with your favorite tools to develop on LocalStack for Snowflake locally, see [Integrations]({{< ref "user-guide/integrations" >}}) | ||
|
||
## Next steps | ||
## Further Reading | ||
|
||
You can now explore the following resources to learn more about the Snowflake emulator: | ||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.