bloominstituteoftechnology · britneh · Jun 9, 2020 · Jun 9, 2020 · Jun 9, 2020 · Jun 10, 2020
diff --git a/BritneH_Unit 3 SC 2.zip b/BritneH_Unit 3 SC 2.zip
diff --git a/BritneH_Unit 3 SC 2/Instructions.md b/BritneH_Unit 3 SC 2/Instructions.md
@@ -0,0 +1,144 @@
+# Data Science Unit 3 Sprint Challenge 2
+
+## Databases and SQL
+
+A SQL Query walks into a bar. In one corner of the bar are two tables. The Query
+walks up to the tables and asks:
+
+...
+
+*"Mind if I join you?"*
+
+---
+
+In this sprint challenge you will write code and answer questions related to
+databases, with a focus on SQL but an acknowledgment of the broader ecosystem.
+You may use any tools and references you wish, but your final code should
+reflect *your* work and be saved in `.py` files (*not* notebooks), and (along
+with this file including your written answers) turned in directly to your TL.
+
+For all your code, you may only import/use the following:
+- other modules you write
+- `sqlite3` (from the standard library)
+
+As always, make sure to manage your time - get a section/question to "good
+enough" and then move on to make sure you do everything. You can always revisit
+and polish at the end if time allows.
+
+This file is Markdown, so it may be helpful to open with VS Code or another tool
+that allows you to view it nicely rendered.
+
+Good luck!
+
+### Part 1 - Making and populating a Database
+
+Consider the following data:
+
+| s   | x | y |
+|-----|---|---|
+| 'g' | 3 | 9 |
+| 'v' | 5 | 7 |
+| 'f' | 8 | 7 |
+
+Using the standard `sqlite3` module:
+
+- Open a connection to a new (blank) database file `demo_data.sqlite3`
+- Make a cursor, and execute an appropriate `CREATE TABLE` statement to accept
+  the above data (name the table `demo`)
+- Write and execute appropriate `INSERT INTO` statements to add the data (as
+  shown above) to the database
+
+Make sure to `commit()` so your data is saved! The file size should be non-zero.
+
+Then write the following queries (also with `sqlite3`) to test:
+
+- Count how many rows you have - it should be 3!
+- How many rows are there where both `x` and `y` are at least 5?
+- How many unique values of `y` are there (hint - `COUNT()` can accept a keyword
+  `DISTINCT`)?
+
+Your code (to reproduce all above steps) should be saved in `demo_data.py` and
+added to the repository along with the generated SQLite database.
+
+### Part 2 - The Northwind Database
+
+Using `sqlite3`, connect to the given `northwind_small.sqlite3` database.
+
+![Northwind Entity-Relationship Diagram](./northwind_erd.png)
+
+Above is an entity-relationship diagram - a picture summarizing the schema and
+relationships in the database. Note that it was generated using Microsoft
+Access, and some of the specific table/field names are different in the provided
+data. You can see all the tables available to SQLite as follows:
+
+```python
+>>> curs.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY
+name;").fetchall()
+[('Category',), ('Customer',), ('CustomerCustomerDemo',),
+('CustomerDemographic',), ('Employee',), ('EmployeeTerritory',), ('Order',),
+('OrderDetail',), ('Product',), ('Region',), ('Shipper',), ('Supplier',),
+('Territory',)]
+```
+
+*Warning*: unlike the diagram, the tables in SQLite are singular and not plural
+(do not end in `s`). And you can see the schema (`CREATE TABLE` statement)
+behind any given table with:
+```python
+>>> curs.execute('SELECT sql FROM sqlite_master WHERE name="Customer";').fetchall()
+[('CREATE TABLE "Customer" \n(\n  "Id" VARCHAR(8000) PRIMARY KEY, \n
+"CompanyName" VARCHAR(8000) NULL, \n  "ContactName" VARCHAR(8000) NULL, \n
+"ContactTitle" VARCHAR(8000) NULL, \n  "Address" VARCHAR(8000) NULL, \n  "City"
+VARCHAR(8000) NULL, \n  "Region" VARCHAR(8000) NULL, \n  "PostalCode"
+VARCHAR(8000) NULL, \n  "Country" VARCHAR(8000) NULL, \n  "Phone" VARCHAR(8000)
+NULL, \n  "Fax" VARCHAR(8000) NULL \n)',)]
+```
+
+In particular note that the *primary* key is `Id`, and not `CustomerId`. On
+other tables (where it is a *foreign* key) it will be `CustomerId`. Also note -
+the `Order` table conflicts with the `ORDER` keyword! We'll just avoid that
+particular table, but it's a good lesson in the danger of keyword conflicts.
+
+Answer the following questions (each is from a single table):
+
+- What are the ten most expensive items (per unit price) in the database?
+- What is the average age of an employee at the time of their hiring? (Hint: a
+  lot of arithmetic works with dates.)
+- (*Stretch*) How does the average age of employee at hire vary by city?
+
+Your code (to load and query the data) should be saved in `northwind.py`, and
+added to the repository. Do your best to answer in purely SQL, but if necessary
+use Python/other logic to help.
+
+### Part 3 - Sailing the Northwind Seas
+
+You've answered some basic questions from the Northwind database, looking at
+individual tables - now it's time to put things together, and `JOIN`!
+
+Using `sqlite3` in `northwind.py`, answer the following:
+
+- What are the ten most expensive items (per unit price) in the database *and*
+  their suppliers?
+- What is the largest category (by number of unique products in it)?
+- (*Stretch*) Who's the employee with the most territories? Use `TerritoryId`
+  (not name, region, or other fields) as the unique identifier for territories.
+
+### Part 4 - Questions (and your Answers)
+
+Answer the following questions, baseline ~3-5 sentences each, as if they were
+interview screening questions (a form you fill when applying for a job):
+
+- In the Northwind database, what is the type of relationship between the
+  `Employee` and `Territory` tables?
+- What is a situation where a document store (like MongoDB) is appropriate, and
+  what is a situation where it is not appropriate?
+- What is "NewSQL", and what is it trying to achieve?
+
+### Part 5 - Turn it in!
+Provide all the files you wrote (`demo_data.py`, `northwind.py`), as well as
+this file with your answers to part 4, directly to your TL. You're also
+encouraged to include the output from your queries as docstring comments, to
+facilitate grading and feedback. Thanks for your hard work!
+
+If you got this far, check out the [larger Northwind
+database](https://github.com/jpwhite3/northwind-SQLite3/blob/master/Northwind_large.sqlite.zip) -
+your queries should run on it as well, with richer results.
diff --git a/BritneH_Unit 3 SC 2/Part2.ipynb b/BritneH_Unit 3 SC 2/Part2.ipynb
diff --git a/BritneH_Unit 3 SC 2/demo_data.py b/BritneH_Unit 3 SC 2/demo_data.py
@@ -0,0 +1,48 @@
+import sqlite3
+conn = sqlite3.connect('demo_data.sqlite3')
+curs = conn.cursor()
+'''
+Create Table called demo ! - accidentally mispelled
+'''
+create = "CREATE TABLE demp(s VARCHAR,x INT, y INT);"
+query = create
+curs.execute(query)
+
+'''
+Fill in table:
+'''
+fill = '''
+INSERT INTO demp (s, x, y) 
+VALUES ('g', 3, 9),
+ ('v', 5, 7),
+ ('f', 8, 7)
+ ;
+ '''
+curs.execute(fill)
+conn.commit()
+
+'''
+Count number of rows:
+'''
+count_rows = '''SELECT COUNT(*) FROM demp'''
+curs.execute(count_rows)
+
+'''
+How many rows are there where both `x` and `y` are at least 5?
+'''
+x_and_y='''
+SELECT COUNT(*)
+FROM demp
+WHERE x>4 AND y>4
+'''
+curs.execute(x_and_y)
+
+'''
+How many unique values of `y` are there (hint - `COUNT()` can accept a keyword
+`DISTINCT`)?
+'''
+distincty = '''
+SELECT COUNT (DISTINCT y) as d
+FROM demp
+'''
+curs.execute(distincty)
diff --git a/BritneH_Unit 3 SC 2/demo_data.sqlite3 b/BritneH_Unit 3 SC 2/demo_data.sqlite3
diff --git a/BritneH_Unit 3 SC 2/northwind.py b/BritneH_Unit 3 SC 2/northwind.py
@@ -0,0 +1,49 @@
+import sqlite3
+conn = sqlite3.connect('northwind_small.sqlite3')
+curs = conn.cursor()
+conn.commit()
+
+'''
+What are the ten most expensive items (per unit price) in the database?
+'''
+most_expensive = '''
+SELECT * 
+FROM Product
+ORDER BY UnitPrice DESC 
+LIMIT 10;
+'''
+curs.execute(most_expensive)
+
+'''
+What is the average age of an employee at the time of their hiring?
+'''
+avg_age ='''
+SELECT AVG(HireDate - BirthDate)
+FROM Employee;
+'''
+curs.execute(avg_age)
+
+'''
+- What are the ten most expensive items (per unit price) in the database *and*
+  their suppliers?
+'''
+top_supplier = '''
+SELECT ProductName, SupplierId, UnitPrice, CompanyName
+FROM Product
+JOIN Supplier
+ON Product.SupplierId = Supplier.Id
+ORDER BY UnitPrice DESC
+LIMIT 10;
+'''
+curs.execute(top_supplier)
+
+
+'''
+- What is the largest category (by number of unique products in it)?
+'''
+
+category = '''
+SELECT CategoryId, ProductName, COUNT (DISTINCT CategoryId) as total
+FROM Product
+'''
+curs.execute(category)
diff --git a/BritneH_Unit 3 SC 2/northwind_small.sqlite3 b/BritneH_Unit 3 SC 2/northwind_small.sqlite3