Skip to content

rewrite snowflake quickstart guide #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added content/en/getting-started/.DS_Store
Binary file not shown.
255 changes: 198 additions & 57 deletions content/en/getting-started/quickstart/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,15 @@ description: Get started with LocalStack for Snowflake in a few simple steps

## Introduction

This guide explains how to set up the Snowflake emulator and develop a Python program using the Snowflake Connector for Python (`snowflake-connector-python`) to interact with emulated Snowflake running on your local machine.
This guide explains how to set up the Snowflake emulator and use Snowflake CLI to interact with Snowflake resources running on your local machine. You'll learn how to create databases, tables, set up automated data ingestion using Snowpipe, and work with S3 storage - all running locally with LocalStack.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this fuller example as it shows how to do something useful. However I'd prefer not to mention s3 here. Not all our snowflake prospects are on the aws ecosystem and this will just confuse them. Plus I don't want awscli as a pre-req to our most basic snowflake quide.


## Prerequisites

- [`localstack` CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli)
- [LocalStack for Snowflake]({{< ref "installation" >}})
- Python 3.10 or later
- [`snowflake-connector-python` library](https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-install)
- [`awscli-local`](https://github.com/localstack/awscli-local) for interacting with LocalStack's S3 service

It is also recommended to setup an [integration]({{< ref "user-guide/integrations/" >}}) to run your SQL queries. We recommend using the [Snowflake CLI]({{< ref "user-guide/integrations/snow-cli" >}}), [DBeaver]({{< ref "user-guide/integrations/dbeaver" >}}) or the [LocalStack Web Application]({{< ref "user-guide/user-interface" >}}) for this purpose.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

LocalStack for Snowflake works with popular Snowflake integrations to run your SQL queries. This guide uses the [Snowflake CLI], but you can also use [snowSQL], [DBeaver] or the [LocalStack Web Application] for this purpose.


## Instructions

Expand All @@ -36,89 +37,229 @@ $ curl -d '{}' snowflake.localhost.localstack.cloud:4566/session
</disable-copy>
{{< / command >}}

### Connect to the Snowflake emulator
In this quickstart, we'll create a student records pipeline that demonstrates how to:

- Create databases, schemas, and tables
- Set up S3 stages for data storage
- Configure Snowpipe for automated data ingestion
- Load sample student data from CSV files

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested above, it's better leave out s3 in the quick start guide, but we can mention it at the end or in next steps, e.g. (the spirit, not exact words)

Why Not Try

  • You can load data through our [Storage Integration] (currently supporting aws s3) or using a script (see [Snowflake Drivers])
  • You can configure [Snowpipe] for automated data ingestion
  • You can continue to work with your favourite tools to develop on LocalStack for Snowflake locally, see [Integrations]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, that means you'll change much of the tutorial here. I suggest modelling on snowflake's own getting started guide https://docs.snowflake.com/en/user-guide/tutorials/snowflake-in-20minutes where it uses the more typical database tutorial data loading method of uploading from local machine. It does so through the PUT command (https://docs.snowflake.com/en/user-guide/tutorials/snowflake-in-20minutes#stage-data-files) which according to our docs we do support as well (https://snowflake.localstack.cloud/user-guide/stages/#upload-data-to-the-stage). This will also significantly cut down the length of the getting started guide and make it extremely easy to follow.


### Create database, schema & table

Create the Snowflake database named `STUDENT_RECORDS_DEMO` and use it:

```sql
CREATE DATABASE IF NOT EXISTS STUDENT_RECORDS_DEMO;
USE DATABASE STUDENT_RECORDS_DEMO;
```

The output should be:

```bash
+-----------------------------------------------------+
| status |
|-----------------------------------------------------|
| Database STUDENT_RECORDS_DEMO successfully created. |
+-----------------------------------------------------+
```

Create a Snowflake schema named `PUBLIC` and use it:

```sql
CREATE SCHEMA IF NOT EXISTS PUBLIC;
USE SCHEMA PUBLIC;
```

The output should be:

```bash
+---------------------------------------------+
| result |
|---------------------------------------------|
| public already exists, statement succeeded. |
+---------------------------------------------+
```

Last, create the table `STUDENT_DATA` in the database:

```sql
CREATE OR REPLACE TABLE STUDENT_DATA (
student_id VARCHAR(50),
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(200),
enrollment_date DATE,
gpa FLOAT,
major VARCHAR(100)
);
```

The output should be:

```bash
+------------------------------------------+
| status |
|------------------------------------------|
| Table STUDENT_DATA successfully created. |
+------------------------------------------+
```

### Create file format & stage

Now, create a file format for CSV files:

```sql
CREATE OR REPLACE FILE FORMAT csv_format
TYPE = CSV
FIELD_DELIMITER = ','
SKIP_HEADER = 1
NULL_IF = ('NULL', 'null')
EMPTY_FIELD_AS_NULL = TRUE;
```

The output should be:

Create a new Python file named `main.py` and use the following code to connect to the Snowflake emulator:
```bash
+----------------------------------------------+
| status |
|----------------------------------------------|
| File format CSV_FORMAT successfully created. |
+----------------------------------------------+
```

```python
import snowflake.connector as sf
You can then create a stage pointing to the S3 bucket:

sf_conn_obj = sf.connect(
user="test",
password="test",
account="test",
database="test",
host="snowflake.localhost.localstack.cloud",
)
```sql
CREATE OR REPLACE STAGE student_data_stage
URL = 's3://student-records-local/data/'
CREDENTIALS = (AWS_KEY_ID='test' AWS_SECRET_KEY='test')
FILE_FORMAT = csv_format
AWS_ROLE = NULL;
```

Specify the `host` parameter as `snowflake.localhost.localstack.cloud` and the other parameters as `test` to avoid connecting to the real Snowflake instance.
The output should be:

### Create and execute a query
```bash
+-----------------------------------------------------+
| ?COLUMN? |
|-----------------------------------------------------|
| Stage area STUDENT_DATA_STAGE successfully created. |
+-----------------------------------------------------+
```

Extend the Python program to insert rows from a list object into the emulated Snowflake table. Create a cursor object and execute the query:
Note that the S3 bucket is not created yet, we'll create it in the upcoming steps.

```python
print("1. Insert lot of rows from a list object to Snowflake table")
print("2. Creating a cursor object")
sf_cur_obj = sf_conn_obj.cursor()
### Create Snowpipe

print("3. Executing a query on cursor object")
try:
sf_cur_obj.execute(
"create or replace table "
"ability(name string, skill string )")
Create a Snowpipe for automated ingestion:

rows_to_insert = [('John', 'SQL'), ('Alex', 'Java'), ('Pete', 'Snowflake')]

sf_cur_obj.executemany(
" insert into ability (name, skill) values (%s,%s) " ,rows_to_insert)
```sql
CREATE OR REPLACE PIPE student_data_pipe
AUTO_INGEST = TRUE
AS
COPY INTO STUDENT_DATA
FROM @student_data_stage
PATTERN='.*[.]csv'
ON_ERROR = 'CONTINUE';
```

sf_cur_obj.execute("select name, skill from ability")
You can see the pipe details by running:

print("4. Fetching the results")
result = sf_cur_obj.fetchall()
print("Total # of rows :" , len(result))
print("Row-1 =>",result[0])
print("Row-2 =>",result[1])
finally:
sf_cur_obj.close()
```sql
DESC PIPE student_data_pipe;
```

This program creates a table named `ability`, inserts rows, and fetches the results.
Copy the `notification_channel` value from the output, which will be used to setup the S3 bucket and event notifications.

### Create a S3 bucket

### Run the Python program
Create a S3 bucket named `student-records-local` using `awslocal`:

Execute the Python program with:
{{< command >}}
$ awslocal s3 mb s3://student-records-local
{{< / command >}}

You can then configure the S3 bucket notification for Snowpipe using `awslocal`:

{{< command >}}
$ python main.py
$ awslocal s3api put-bucket-notification-configuration \
--bucket student-records-local \
--notification-configuration '{
"QueueConfigurations": [
{
"Id": "snowpipe-ingest-notification",
"QueueArn": "arn:aws:sqs:us-east-1:000000000000:sf-snowpipe-test",
"Events": ["s3:ObjectCreated:*"]
}
]
}'
{{< / command >}}

Replace the `QueueArn` value with the `notification_channel` value from the Snowpipe details, if its different.

### Upload sample data

Create a new file named `student_data.csv` with sample student records:

```csv
student_id,first_name,last_name,email,enrollment_date,gpa,major
S001,John,Smith,[email protected],2023-08-15,3.75,Computer Science
S002,Alice,Johnson,[email protected],2023-08-15,3.92,Mathematics
S003,Bob,Williams,[email protected],2022-08-15,3.45,Engineering
S004,Carol,Brown,[email protected],2024-01-10,3.88,Physics
S005,David,Davis,[email protected],2023-08-15,2.95,Biology
```

Upload the CSV file to the S3 bucket using `awslocal`:

{{< command >}}
$ awslocal s3 cp student_data.csv s3://student-records-local/data/
{{< / command >}}

### Verify data ingestion

Now that the CSV file has been uploaded, Snowpipe should automatically ingest the data into the table. Let's verify the data was loaded successfully:

```sql
USE DATABASE STUDENT_RECORDS_DEMO;
USE SCHEMA PUBLIC;

SELECT COUNT(*) as total_students FROM STUDENT_DATA;
```

The output should be:

```bash
Insert lot of rows from a list object to Snowflake table
1. Insert lot of rows from a list object to Snowflake table
2. Creating a cursor object
3. Executing a query on cursor object
4. Fetching the results
Total # of rows : 3
Row-1 => ('John', 'SQL')
Row-2 => ('Alex', 'Java')
+----------------+
| TOTAL_STUDENTS |
|----------------|
| 5 |
+----------------+
```

Verify the results by navigating to the LocalStack logs:
Similarly, you can query the student details based on their GPA:

```sql
SELECT first_name, last_name, major, gpa FROM STUDENT_DATA WHERE gpa >= 3.8;
```

The output should be:

```bash
2024-02-22T06:03:13.627 INFO --- [ asgi_gw_0] localstack.request.http : POST /session/v1/login-request => 200
2024-02-22T06:03:16.122 WARN --- [ asgi_gw_0] l.packages.core : postgresql will be installed as an OS package, even though install target is _not_ set to be static.
2024-02-22T06:03:45.917 INFO --- [ asgi_gw_0] localstack.request.http : POST /queries/v1/query-request => 200
2024-02-22T06:03:46.016 INFO --- [ asgi_gw_1] localstack.request.http : POST /queries/v1/query-request => 200
2024-02-22T06:03:49.361 INFO --- [ asgi_gw_0] localstack.request.http : POST /queries/v1/query-request => 200
2024-02-22T06:03:49.412 INFO --- [ asgi_gw_1] localstack.request.http : POST /session => 200
+---------------------------------------------+
| FIRST_NAME | LAST_NAME | MAJOR | GPA |
|------------+-----------+-------------+------|
| Alice | Johnson | Mathematics | 3.92 |
| Carol | Brown | Physics | 3.88 |
| Alice | Johnson | Mathematics | 3.92 |
| Carol | Brown | Physics | 3.88 |
+---------------------------------------------+
```

Optionally, you can also query your Snowflake resources & data using the LocalStack Web Application, that provides a **Worksheet** tab to run your SQL queries.

<img src="snowflake-web-ui.png" alt="Running SQL queries using LocalStack Web Application" width="900"/>

### Destroy the local infrastructure

To stop LocalStack and remove locally created resources, use:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant for later lines but it's greyed out and I can't seem to comment there:
We're not going to mention Cloud Pods and have a docs ticket open to move the mentioning of it from snowflake docs. Instead I suggest edit the persistence paragraph to something like this:

... all locally created resources are automatically removed. To persist the state of your LocalStack for Snowflake instance, please check out our guide on [State Management].

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.