Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions .idea/DS-Unit-3-Sprint-2-SQL-and-Databases.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions .idea/inspectionProfiles/Project_Default.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

137 changes: 137 additions & 0 deletions Study Guide/Unit 3 Sprint 2 SQL and Databases Study Guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Unit 3 Sprint 2 SQL and Databases Study Guide

This study guide should reinforce and provide practice for all of the concepts you have seen in the past week.
There are a mix of written questions and coding exercises, both are equally important to prepare you for the
sprint challenge as well as to be able to speak on these topics comfortably in interviews and on the job.

If you get stuck or are unsure of something remember the 20 minute rule. If that doesn't help,
then research a solution with [google](https://www.google.com) or [StackOverflow](https://www.stackoverflow.com).
Only once you have exhausted these methods should you turn to your Team Lead - they won't be there on your SC or during an interview.
That being said, don't hesitate to ask for help if you truly are stuck.

Have fun studying!

## SQL

**Concepts:**

1. What is SQL?
- Selective Query Language: It is a simple way we can query databases in order to obtain the data we want from specific
tables or areas
2. What is a RDBMS?
- Relational Data Base Management System: These are systems that allow us to interact with a database: DB browser, Postgres
3. What is an ETL pipeline?
- Extract Transform Load: This is a way that DS takes data from one place and places it into another.
-- Extract = Data out
-- Transform = Taking it from one form to another form
-- Load = taking it in that new form and inserting, or entering it into the new database or structure.
4. What is a schema?
- Schema is a framework that ensures a structure for the database to accept certain formats of data - Deciding on
datatypes, lengths of strings, and Primary keys when necessary

5. What does each letter in ACID stand for? Give an explanation for each and why they matter?
- **A**
- **C**
- **I**
- **D**
6. Explain each of the table relationships and give an example for each
- One-to-One: Country to Capital, for each instance there is one singular connection
- One-to-Many: Book to Pages, for on instance there are many of the connection
- Many-to-Many: Books to Authors, May Authors write many books and often interconnect on certain books.

## Syntax
For the following section, give a brief explanation of each of the SQL commands.

1. **SELECT** - Specify the columns that are wanted FROM a certain table in the DB
('SELECT character_id, name, level FROM charactercreator_character')
2. **WHERE** - A Joint logical condition.
('WHERE character_id >10 AND LEVEL >10)
3. **LIMIT** - The amount of results you receive during the .fetachll() period
4. **ORDER** - Choosing a column in which the order is specified - can use number based columns and add DESC
5. **JOIN** - INNER, LEFT, RIGHT, MIDDLE also ON Allows for merging of table by different aspects. Inner is often preffered,
because it allows for no missing values. Left can be useful for seeing the difference in certain tables.
6. **CREATE TABLE** - When loading data there needs to be a table with a proper schema to load the data.
7. **INSERT** -Using this allows to insert data into the table
8. **DISTINCT** -Parameter used during SELECT that allows to only include values that are not repeats of another
9. **GROUP BY** -Using this will aid in an implicit join
10. **ORDER BY** -
11. **AVG** -
12. **MAX** -
13. **AS** - Casting a specific section to a different ID in order to limit the amount of writing necessary for the query

## Starting From Scratch
Create a file named `study_part1.py` and complete the exercise below. The only library you should need to import is `sqlite3`.
Don't forget to be PEP8 compliant!
1. Create a new database file call `study_part1.sqlite3`
2. Create a table with the following columns
```
student - string
studied - string
grade - int
age - int
sex - string
```

3. Fill the table with the following data

```
'Lion-O', 'True', 85, 24, 'Male'
'Cheetara', 'True', 95, 22, 'Female'
'Mumm-Ra', 'False', 65, 153, 'Male'
'Snarf', 'False', 70, 15, 'Male'
'Panthro', 'True', 80, 30, 'Male'
```

4. Save your data. You can check that everything is working so far if you can view the table and data in DBBrowser

5. Write the following queries to check your work. Querie outputs should be formatted for readability, don't simply print a number to the screen with no explanation, add context.

```
What is the average age? Expected Result - 48.8
What are the name of the female students? Expected Result - 'Cheetara'
How many students studied? Expected Results - 3
Return all students and all columns, sorted by student names in alphabetical order.
```

## Query All the Tables!

### Setup
Before we get started you'll need a few things.
1. Download the [Chinook Database here](https://github.com/bundickm/Study-Guides/blob/master/data/Chinook_Sqlite.sqlite)
2. The schema can be [found here](https://github.com/bundickm/Study-Guides/blob/master/data/Chinook%20Schema.png)
3. Create a file named `study_part2.py` and complete the exercise below. The only library you should need to import is `sqlite3`. Don't forget to be PEP8 compliant!
4. Add a connection to the chinook database so that you can answer the queries below.

### Queries
**Single Table Queries**
1. Find the average invoice total for each customer, return the details for the first 5 ID's
2. Return all columns in Customer for the first 5 customers residing in the United States
3. Which employee does not report to anyone?
4. Find the number of unique composers
5. How many rows are in the Track table?

**Joins**

6. Get the name of all Black Sabbath tracks and the albums they came off of
7. What is the most popular genre by number of tracks?
8. Find all customers that have spent over $45
9. Find the first and last name, title, and the number of customers each employee has helped. If the customer count is 0 for an employee, it doesn't need to be displayed. Order the employees from most to least customers.
10. Return the first and last name of each employee and who they report to

## NoSQL

### Questions of Understanding

1. What is a document store?

2. What is a `key:value` pair? What data type in Python uses `key:value` pairs?

3. Give an example of when it would be best to use a SQL Database and when it would be best to use a NoSQL Database

4. What are some of the trade-offs between SQL and NoSQL?

5. What does each letter in BASE stand for? Give an explanation for each and why they matter?
- B
- A
- S
- E
57 changes: 57 additions & 0 deletions Study Guide/study_part1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
Study guide practicing importing data to sqlite file
"""

import sqlite3

# directions

sl_conn = sqlite3.connect('study_part1.sqlite3')

sl_curs = sl_conn.cursor()

"""
student - string
studied - string
grade - int
age - int
sex - string
"""

sl_curs.execute("DROP TABLE IF EXISTS students;")
sl_conn.commit()

create_table = """
CREATE TABLE students (
student TEXT,
studied TEXT,
grade INT,
age INT,
sex TEXT
);
"""

sl_curs.execute(create_table)

sl_conn.commit()

students = [
('Lion-O', 'True', 85, 24, 'Male'),
('Cheetara', 'True', 95, 22, 'Female'),
('Mumm-Ra', 'False', 65, 153, 'Male'),
('Snarf', 'False', 70, 15, 'Male'),
('Panthro', 'True', 80, 30, 'Male')
]

for student in students:
insert = f"""
INSERT INTO students (student, studied, grade, age, sex)
VALUES {student};"""
sl_curs.execute(insert)

sl_conn.commit()

sl_curs.execute('SELECT * FROM students;')
results = sl_curs.fetchall()

print(results)
Binary file added Study Guide/study_part1.sqlite3
Binary file not shown.
3 changes: 3 additions & 0 deletions module1-introduction-to-sql/.idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions module1-introduction-to-sql/.idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions module1-introduction-to-sql/.idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions module1-introduction-to-sql/.idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions module1-introduction-to-sql/buddymove_holidayiq.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import pandas as pd
import sqlite3

# read in data using pandas
df = pd.read_csv('buddymove_holidayiq.csv')

# create database for csv file
conn = sqlite3.connect('buddymove_holidayiq.sqlite3')

# convert df into DB for SQL use
# df.to_sql('reviews', con=conn)

# function for running queries
def execute_query(cursor, query):
cursor.execute(query)
return cursor.fetchall()

curs = conn.cursor()

# count number of rows in DB
num_rows = """
SELECT COUNT(*)
FROM reviews;
"""
# Answer: 249
results1 = execute_query(curs, num_rows)

# How many users who reviewed at least 100 `Nature` in the category also
# reviewed at least 100 in the `Shopping` category?
karens = """
SELECT COUNT(*)
FROM reviews
WHERE Nature > 100
AND Shopping > 100;
"""
# Answer: 78
results2 = execute_query(curs, karens)

# - (*Stretch*) What are the average number of reviews for each category?
avg_reviews = """
SELECT AVG(Sports),
AVG(Religious),
AVG(Nature),
AVG(Shopping),
AVG(Picnic),
AVG(Theatre)
FROM reviews;
"""
results3 = execute_query(curs, avg_reviews)




if __name__ == '__main__':
print(f'Report from buddymove_holidayiq \n'
f'Number of Users: {results1[0][0]} \n'
f'Number of Users whom have over 100: {results2[0][0]} \n'
f'Average number of reviews Sports: Nature reviews and over 100 Shopping reviews{results3[0][0]} \n'
f'Average number of reviews Religious:{results3[0][1]} \n'
f'Average number of reviews Nature:{results3[0][2]} \n'
f'Average number of reviews Shopping:{results3[0][3]} \n'
f'Average number of reviews Picnic:{results3[0][4]} \n'
f'Average number of reviews Theatre:{results3[0][5]} \n'
)
Binary file not shown.
Loading