Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
pandas = "*"

[requires]
python_version = "3.7"
90 changes: 90 additions & 0 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Unit-3-Sprint-Challenge-2/.gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

144 changes: 144 additions & 0 deletions Unit-3-Sprint-Challenge-2/challenge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Data Science Unit 3 Sprint Challenge 2

## Databases and SQL

A SQL Query walks into a bar. In one corner of the bar are two tables. The Query
walks up to the tables and asks:

...

*"Mind if I join you?"*

---

In this sprint challenge you will write code and answer questions related to
databases, with a focus on SQL but an acknowledgment of the broader ecosystem.
You may use any tools and references you wish, but your final code should
reflect *your* work and be saved in `.py` files (*not* notebooks), and (along
with this file including your written answers) turned in directly to your TL.

For all your code, you may only import/use the following:
- other modules you write
- `sqlite3` (from the standard library)

As always, make sure to manage your time - get a section/question to "good
enough" and then move on to make sure you do everything. You can always revisit
and polish at the end if time allows.

This file is Markdown, so it may be helpful to open with VS Code or another tool
that allows you to view it nicely rendered.

Good luck!

### Part 1 - Making and populating a Database

Consider the following data:

| s | x | y |
|-----|---|---|
| 'g' | 3 | 9 |
| 'v' | 5 | 7 |
| 'f' | 8 | 7 |

Using the standard `sqlite3` module:

- Open a connection to a new (blank) database file `demo_data.sqlite3`
- Make a cursor, and execute an appropriate `CREATE TABLE` statement to accept
the above data (name the table `demo`)
- Write and execute appropriate `INSERT INTO` statements to add the data (as
shown above) to the database

Make sure to `commit()` so your data is saved! The file size should be non-zero.

Then write the following queries (also with `sqlite3`) to test:

- Count how many rows you have - it should be 3!
- How many rows are there where both `x` and `y` are at least 5?
- How many unique values of `y` are there (hint - `COUNT()` can accept a keyword
`DISTINCT`)?

Your code (to reproduce all above steps) should be saved in `demo_data.py` and
added to the repository along with the generated SQLite database.

### Part 2 - The Northwind Database

Using `sqlite3`, connect to the given `northwind_small.sqlite3` database.

![Northwind Entity-Relationship Diagram](./northwind_erd.png)

Above is an entity-relationship diagram - a picture summarizing the schema and
relationships in the database. Note that it was generated using Microsoft
Access, and some of the specific table/field names are different in the provided
data. You can see all the tables available to SQLite as follows:

```python
>>> curs.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY
name;").fetchall()
[('Category',), ('Customer',), ('CustomerCustomerDemo',),
('CustomerDemographic',), ('Employee',), ('EmployeeTerritory',), ('Order',),
('OrderDetail',), ('Product',), ('Region',), ('Shipper',), ('Supplier',),
('Territory',)]
```

*Warning*: unlike the diagram, the tables in SQLite are singular and not plural
(do not end in `s`). And you can see the schema (`CREATE TABLE` statement)
behind any given table with:
```python
>>> curs.execute('SELECT sql FROM sqlite_master WHERE name="Customer";').fetchall()
[('CREATE TABLE "Customer" \n(\n "Id" VARCHAR(8000) PRIMARY KEY, \n
"CompanyName" VARCHAR(8000) NULL, \n "ContactName" VARCHAR(8000) NULL, \n
"ContactTitle" VARCHAR(8000) NULL, \n "Address" VARCHAR(8000) NULL, \n "City"
VARCHAR(8000) NULL, \n "Region" VARCHAR(8000) NULL, \n "PostalCode"
VARCHAR(8000) NULL, \n "Country" VARCHAR(8000) NULL, \n "Phone" VARCHAR(8000)
NULL, \n "Fax" VARCHAR(8000) NULL \n)',)]
```

In particular note that the *primary* key is `Id`, and not `CustomerId`. On
other tables (where it is a *foreign* key) it will be `CustomerId`. Also note -
the `Order` table conflicts with the `ORDER` keyword! We'll just avoid that
particular table, but it's a good lesson in the danger of keyword conflicts.

Answer the following questions (each is from a single table):

- What are the ten most expensive items (per unit price) in the database?
- What is the average age of an employee at the time of their hiring? (Hint: a
lot of arithmetic works with dates.)
- (*Stretch*) How does the average age of employee at hire vary by city?

Your code (to load and query the data) should be saved in `northwind.py`, and
added to the repository. Do your best to answer in purely SQL, but if necessary
use Python/other logic to help.

### Part 3 - Sailing the Northwind Seas

You've answered some basic questions from the Northwind database, looking at
individual tables - now it's time to put things together, and `JOIN`!

Using `sqlite3` in `northwind.py`, answer the following:

- What are the ten most expensive items (per unit price) in the database *and*
their suppliers?
- What is the largest category (by number of unique products in it)?
- (*Stretch*) Who's the employee with the most territories? Use `TerritoryId`
(not name, region, or other fields) as the unique identifier for territories.

### Part 4 - Questions (and your Answers)

Answer the following questions, baseline ~3-5 sentences each, as if they were
interview screening questions (a form you fill when applying for a job):

- In the Northwind database, what is the type of relationship between the
`Employee` and `Territory` tables?
- What is a situation where a document store (like MongoDB) is appropriate, and
what is a situation where it is not appropriate?
- What is "NewSQL", and what is it trying to achieve?

### Part 5 - Turn it in!
Provide all the files you wrote (`demo_data.py`, `northwind.py`), as well as
this file with your answers to part 4, directly to your TL. You're also
encouraged to include the output from your queries as docstring comments, to
facilitate grading and feedback. Thanks for your hard work!

If you got this far, check out the [larger Northwind
database](https://github.com/jpwhite3/northwind-SQLite3/blob/master/Northwind_large.sqlite.zip) -
your queries should run on it as well, with richer results.
Binary file added Unit-3-Sprint-Challenge-2/northwind_erd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file added buddymove_holidayiq.sqlite3
Empty file.
14 changes: 14 additions & 0 deletions module1-introduction-to-sql/buddymove_holidayiq.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import pandas as pd
import sqlite3
# df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00476/buddymove_holidayiq.csv")

conn = sqlite3.connect('buddymove_holidayiq.sqlite3')
# df.to_sql("review",con=conn)

# count how many rows you have

print(conn.execute("SELECT COUNT(*) FROM review;").fetchall())

print(conn.execute("SELECT * FROM review WHERE Nature >= 100 LIMIT 10;").fetchall())

print(conn.execute("SELECT AVG(Sports),AVG(Religious),AVG(Nature),AVG(Theatre),AVG(Shopping),AVG(Picnic) FROM review;").fetchall())
44 changes: 44 additions & 0 deletions module1-introduction-to-sql/rpg_queries.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import sqlite3
conn = sqlite3.connect('/Users/user/Documents/GitHub/Lambda/DS-Unit-3-Sprint-2-SQL-and-Databases/module1-introduction-to-sql/rpg_db.sqlite3')
c = conn.cursor()


#How many total Characters there?
c1 = c
print(c1.execute('SELECT COUNT(*) FROM charactercreator_character;').fetchall())

#How many of each specific subclass?

print(len(c1.execute('SELECT * FROM charactercreator_character;').fetchall()[0][2:]))

#How many total items?

print(c1.execute('SELECT COUNT(*) FROM armory_item;').fetchall())

#How many of the Items are weapons?

print(c1.execute('SELECT COUNT(*) FROM armory_weapon;').fetchall())

# How many are not?

print(len(c1.execute('SELECT * FROM armory_item;').fetchall()) - len(c1.execute('SELECT * FROM armory_weapon;').fetchall()))

# How many Items does each character have? (Return first 20 rows)

print(c1.execute('SELECT character_id, count(*) FROM charactercreator_character_inventory GROUP BY item_id LIMIT 20;').fetchall())

# How many Weapons does each character have? (Return first 20 rows)

print(c1.execute('SELECT cci.character_id,count(*) FROM armory_weapon as aw, charactercreator_character_inventory as cci WHERE cci.item_id = aw.item_ptr_id GROUP BY cci.character_id LIMIT 20;').fetchall())

# On average, how many Items does each Character have?

table = c1.execute('SELECT character_id, count(*) FROM charactercreator_character_inventory GROUP BY item_id;').fetchall()[:]

print(sum([x[1] for x in table]) / len(table))

# On average, how many Weapons does each character have?

table = c1.execute('SELECT cci.character_id,count(*) FROM armory_weapon as aw, charactercreator_character_inventory as cci WHERE cci.item_id = aw.item_ptr_id GROUP BY cci.character_id;').fetchall()

print(sum([x[1] for x in table]) / len(table))
19 changes: 19 additions & 0 deletions module2-sql-for-analysis/Stretch_goal_postgres_and_mongo.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
def increment(x):
return x + 1

def double(x):
return x * 2

def run_twice(func, arg):
return func(func(arg))

def rec_print(n):
print(n)
if n > 0:
rec_print(n-1)

def add(x,y):
return x + y

def identity(x):
return x
Loading