diff --git a/module1-introduction-to-sql/Pipfile b/module1-introduction-to-sql/Pipfile new file mode 100644 index 00000000..b5846df1 --- /dev/null +++ b/module1-introduction-to-sql/Pipfile @@ -0,0 +1,11 @@ +[[source]] +name = "pypi" +url = "https://pypi.org/simple" +verify_ssl = true + +[dev-packages] + +[packages] + +[requires] +python_version = "3.8" diff --git a/module1-introduction-to-sql/README.md b/module1-introduction-to-sql/README.md index 40497956..68a7c5ae 100644 --- a/module1-introduction-to-sql/README.md +++ b/module1-introduction-to-sql/README.md @@ -1,139 +1,139 @@ -# Introduction to SQL - -The basics of Structured Query Language, a relatively simple query language. - -## Learning Objectives - -- Write basic SQL queries to get specific subsets of data from a database and - answer basic "business questions" -- Understand the purpose of SQL join, and perform joins to access data from - multiple tables - -## Before Lecture - -The Python Standard Library includes a module -[sqlite3](https://docs.python.org/3/library/sqlite3.html), an API for data -persistence via the SQLite - a simple disk-based database that doesn't require a -separate server process. Read the tutorial, and try the given examples. See if -you can modify them in simple ways, and come with questions! - -Also, check out the [DB Browser for SQLite](https://sqlitebrowser.org) - we'll -emphasize using `sqlite3` from Python so we can do things programmatically, but -it is encouraged to install the DB Browser as a helpful utility for ad hoc -inspection and querying. - -## Live Lecture Task - -We'll work together with SQLite in Python, making and exploring a simple -database and trying a range of basic queries. Focus will be on the following SQL -keywords: - -- `SELECT` - how we choose which columns to get -- `WHERE` - how we set conditions on the rows to be returned -- `LIMIT` - when we only want a certain number of rows -- `ORDER` - when we want to sort the output -- `JOIN` - when we need data from multiple tables combined - -We'll also learn about how to use `CREATE TABLE` to specify a schema for our -data, and `INSERT` statements to put data into a table. And lastly, we'll learn -how to calculate some basic statistics with `COUNT()`, `AVG()`, and `SUM()`, -organized using the keyword `GROUP`. - -## Assignment - Part 1, Querying a Database - -This directory contains a file `rpg_db.sqlite3`, a database for a hypothetical -webapp role-playing game. This test data has dozens-to-hundreds of randomly -generated characters across the base classes (Fighter, Mage, Cleric, and Thief) -as well as a few Necromancers. Also generated are Items, Weapons, and -connections from characters to them. Note that, while the name field was -randomized, the numeric and boolean fields were left as defaults. - -Use `sqlite3` to load and write queries to explore the data, and answer the -following questions: - -- How many total Characters are there? -- How many of each specific subclass? -- How many total Items? -- How many of the Items are weapons? How many are not? -- How many Items does each character have? (Return first 20 rows) -- How many Weapons does each character have? (Return first 20 rows) -- On average, how many Items does each Character have? -- On average, how many Weapons does each character have? - -You do not need all the tables - in particular, the `account_*`, `auth_*`, -`django_*`, and `socialaccount_*` tables are for the application and do not have -the data you need. the `charactercreator_*` and `armory_*` tables and where you -should focus your attention. `armory_item` and `charactercreator_character` are -the main tables for Items and Characters respectively - the other tables are -subsets of them by type (i.e. subclasses), connected via a key (`item_id` and -`character_id`). - -You can use the DB Browser or other tools to explore the data and work on your -queries if you wish, but to complete the assignment you should write a file -`rpg_queries.py` that imports `sqlite3` and programmatically executes and -reports results for the above queries. - -Some of these queries are challenging - that's OK! You can keep working on them -tomorrow as well (we'll visit loading the same data into PostgreSQL). It's also -OK to figure out the results partially with a query and partially with a bit of -logic or math afterwards, though doing things purely with SQL is a good goal. -[Subqueries](https://www.w3resource.com/sql/subqueries/understanding-sql-subqueries.php) -and [aggregation functions](https://www.sqltutorial.org/sql-aggregate-functions/) -may be helpful for putting together more complicated queries. - -## Assigment - Part 2, Making and populating a Database - -Load the data (use `pandas`) from the provided file `buddymove_holidayiq.csv` -(the [BuddyMove Data -Set](https://archive.ics.uci.edu/ml/datasets/BuddyMove+Data+Set)) - you should -have 249 rows, 7 columns, and no missing values. The data reflects the number of -place reviews by given users across a variety of categories (sports, parks, -malls, etc.). - -Using the standard `sqlite3` module: - -- Open a connection to a new (blank) database file `buddymove_holidayiq.sqlite3` -- Use `df.to_sql` - ([documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)) - to insert the data into a new table `review` in the SQLite3 database - -Then write the following queries (also with `sqlite3`) to test: - -- Count how many rows you have - it should be 249! -- How many users who reviewed at least 100 `Nature` in the category also - reviewed at least 100 in the `Shopping` category? -- (*Stretch*) What are the average number of reviews for each category? - -Your code (to reproduce all above steps) should be saved in -`buddymove_holidayiq.py`, and added to the repository along with the generated -SQLite database. - -## Resources and Stretch Goals - -For a more complicated example SQLite database with a number of tables to play -with, check out this [SQLite Sample -Database](https://www.sqlitetutorial.net/sqlite-sample-database/). - -The RPG data also exists in a [JSON -file](https://github.com/LambdaSchool/Django-RPG/blob/master/testdata.json) - -try loading it with the standard [json -module](https://docs.python.org/3.5/library/json.html), and reproducing the -above queries with direct manipulation of the Python dictionaries. Also, try to -load it into a `pandas` dataframe and reproduce the above queries with -appropriate dataframe function calls. - -This database is part of a Django (Python webapp framework) application, the -[Django-RPG](https://github.com/LambdaSchool/Django-RPG/tree/master) - check it -out, and (though this is very much a stretch goal) you can [get started with -Django](https://www.djangoproject.com/start/) and see if you can run it -(definitely use `pipenv`!). If you are able to, then you can use the the [Django -ORM](https://docs.djangoproject.com/en/2.1/topics/db/) (object-relational -mapping, a way to interact with SQL through programming language objects), and -[query](https://docs.djangoproject.com/en/2.1/topics/db/queries/) the data. -You'll find that the questions we answered with pure SQL are remarkably simple -to answer using the ORM. - -If you need one more stretch goal - the RPG data was generated using -[django-autofixture](https://github.com/volrath/django-autofixture), a tool that -facilitates tests by randomly generating data. Check it out, and if you got -Django working, see if you can generate more data. +# Introduction to SQL + +The basics of Structured Query Language, a relatively simple query language. + +## Learning Objectives + +- Write basic SQL queries to get specific subsets of data from a database and + answer basic "business questions" +- Understand the purpose of SQL join, and perform joins to access data from + multiple tables + +## Before Lecture + +The Python Standard Library includes a module +[sqlite3](https://docs.python.org/3/library/sqlite3.html), an API for data +persistence via the SQLite - a simple disk-based database that doesn't require a +separate server process. Read the tutorial, and try the given examples. See if +you can modify them in simple ways, and come with questions! + +Also, check out the [DB Browser for SQLite](https://sqlitebrowser.org) - we'll +emphasize using `sqlite3` from Python so we can do things programmatically, but +it is encouraged to install the DB Browser as a helpful utility for ad hoc +inspection and querying. + +## Live Lecture Task + +We'll work together with SQLite in Python, making and exploring a simple +database and trying a range of basic queries. Focus will be on the following SQL +keywords: + +- `SELECT` - how we choose which columns to get +- `WHERE` - how we set conditions on the rows to be returned +- `LIMIT` - when we only want a certain number of rows +- `ORDER` - when we want to sort the output +- `JOIN` - when we need data from multiple tables combined + +We'll also learn about how to use `CREATE TABLE` to specify a schema for our +data, and `INSERT` statements to put data into a table. And lastly, we'll learn +how to calculate some basic statistics with `COUNT()`, `AVG()`, and `SUM()`, +organized using the keyword `GROUP`. + +## Assignment - Part 1, Querying a Database + +This directory contains a file `rpg_db.sqlite3`, a database for a hypothetical +webapp role-playing game. This test data has dozens-to-hundreds of randomly +generated characters across the base classes (Fighter, Mage, Cleric, and Thief) +as well as a few Necromancers. Also generated are Items, Weapons, and +connections from characters to them. Note that, while the name field was +randomized, the numeric and boolean fields were left as defaults. + +Use `sqlite3` to load and write queries to explore the data, and answer the +following questions: + +- How many total Characters are there? +- How many of each specific subclass? +- How many total Items? +- How many of the Items are weapons? How many are not? +- How many Items does each character have? (Return first 20 rows) +- How many Weapons does each character have? (Return first 20 rows) +- On average, how many Items does each Character have? +- On average, how many Weapons does each character have? + +You do not need all the tables - in particular, the `account_*`, `auth_*`, +`django_*`, and `socialaccount_*` tables are for the application and do not have +the data you need. the `charactercreator_*` and `armory_*` tables and where you +should focus your attention. `armory_item` and `charactercreator_character` are +the main tables for Items and Characters respectively - the other tables are +subsets of them by type (i.e. subclasses), connected via a key (`item_id` and +`character_id`). + +You can use the DB Browser or other tools to explore the data and work on your +queries if you wish, but to complete the assignment you should write a file +`rpg_queries.py` that imports `sqlite3` and programmatically executes and +reports results for the above queries. + +Some of these queries are challenging - that's OK! You can keep working on them +tomorrow as well (we'll visit loading the same data into PostgreSQL). It's also +OK to figure out the results partially with a query and partially with a bit of +logic or math afterwards, though doing things purely with SQL is a good goal. +[Subqueries](https://www.w3resource.com/sql/subqueries/understanding-sql-subqueries.php) +and [aggregation functions](https://www.sqltutorial.org/sql-aggregate-functions/) +may be helpful for putting together more complicated queries. + +## Assigment - Part 2, Making and populating a Database + +Load the data (use `pandas`) from the provided file `buddymove_holidayiq.csv` +(the [BuddyMove Data +Set](https://archive.ics.uci.edu/ml/datasets/BuddyMove+Data+Set)) - you should +have 249 rows, 7 columns, and no missing values. The data reflects the number of +place reviews by given users across a variety of categories (sports, parks, +malls, etc.). + +Using the standard `sqlite3` module: + +- Open a connection to a new (blank) database file `buddymove_holidayiq.sqlite3` +- Use `df.to_sql` + ([documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)) + to insert the data into a new table `review` in the SQLite3 database + +Then write the following queries (also with `sqlite3`) to test: + +- Count how many rows you have - it should be 249! +- How many users who reviewed at least 100 `Nature` in the category also + reviewed at least 100 in the `Shopping` category? +- (*Stretch*) What are the average number of reviews for each category? + +Your code (to reproduce all above steps) should be saved in +`buddymove_holidayiq.py`, and added to the repository along with the generated +SQLite database. + +## Resources and Stretch Goals + +For a more complicated example SQLite database with a number of tables to play +with, check out this [SQLite Sample +Database](https://www.sqlitetutorial.net/sqlite-sample-database/). + +The RPG data also exists in a [JSON +file](https://github.com/LambdaSchool/Django-RPG/blob/master/testdata.json) - +try loading it with the standard [json +module](https://docs.python.org/3.5/library/json.html), and reproducing the +above queries with direct manipulation of the Python dictionaries. Also, try to +load it into a `pandas` dataframe and reproduce the above queries with +appropriate dataframe function calls. + +This database is part of a Django (Python webapp framework) application, the +[Django-RPG](https://github.com/LambdaSchool/Django-RPG/tree/master) - check it +out, and (though this is very much a stretch goal) you can [get started with +Django](https://www.djangoproject.com/start/) and see if you can run it +(definitely use `pipenv`!). If you are able to, then you can use the the [Django +ORM](https://docs.djangoproject.com/en/2.1/topics/db/) (object-relational +mapping, a way to interact with SQL through programming language objects), and +[query](https://docs.djangoproject.com/en/2.1/topics/db/queries/) the data. +You'll find that the questions we answered with pure SQL are remarkably simple +to answer using the ORM. + +If you need one more stretch goal - the RPG data was generated using +[django-autofixture](https://github.com/volrath/django-autofixture), a tool that +facilitates tests by randomly generating data. Check it out, and if you got +Django working, see if you can generate more data. diff --git a/module1-introduction-to-sql/buddymove_holidayiq.py b/module1-introduction-to-sql/buddymove_holidayiq.py new file mode 100644 index 00000000..1ea7cfbb --- /dev/null +++ b/module1-introduction-to-sql/buddymove_holidayiq.py @@ -0,0 +1,36 @@ +import pandas as pd +import sqlite3 + +df = pd.read_csv("buddymove_holidayiq.csv") + +conn = sqlite3.connect('buddymove_holidayiq.sqlite3') +c = conn.cursor() + +df.to_sql("buddymove_holidayiq", conn) + +def connect_to_db(db_name="buddymove_holidayiq.sqlite3"): + return sqlite3.connect(db_name) + +def execute_query(cursor, query): + cursor.execute(query) + return cursor.fetchall() + +ROW_COUNT = """ +SELECT COUNT(*) FROM buddymove_holidayiq +""" + +USER_REVIEWS = """ + +SELECT COUNT(*) +FROM buddymove_holidayiq +WHERE buddymove_holidayiq.Nature > 100 +AND buddymove_holidayiq.Shopping > 100 +""" + +if __name__ == "__main__": + conn = connect_to_db() + curs = conn.cursor() + row_count = execute_query(curs, ROW_COUNT) + print("Row Count:", row_count) + user_reviews = execute_query(curs, USER_REVIEWS) + print("User Reviews:", user_reviews) \ No newline at end of file diff --git a/module1-introduction-to-sql/buddymove_holidayiq.sqlite3 b/module1-introduction-to-sql/buddymove_holidayiq.sqlite3 new file mode 100644 index 00000000..a01c1a27 Binary files /dev/null and b/module1-introduction-to-sql/buddymove_holidayiq.sqlite3 differ diff --git a/module1-introduction-to-sql/rpg_db_example.py b/module1-introduction-to-sql/rpg_db_example.py new file mode 100644 index 00000000..3cc1a783 --- /dev/null +++ b/module1-introduction-to-sql/rpg_db_example.py @@ -0,0 +1,102 @@ +import sqlite3 + +def connect_to_db(db_name="rpg_db.sqlite3"): + return sqlite3.connect(db_name) + +def execute_query(cursor, query): + cursor.execute(query) + return cursor.fetchall() + +GET_CHARACTERS = """ + SELECT * + FROM charactercreator_character +""" + +CHARACTER_COUNT = """ + SELECT COUNT(*) + FROM charactercreator_character +""" +CLASS_COUNT = """ +SELECT (SELECT COUNT(*) FROM charactercreator_cleric) AS cleric, +(SELECT COUNT(*) FROM charactercreator_fighter) AS fighter, +(SELECT COUNT(*) FROM charactercreator_mage) AS mage, +(SELECT COUNT(*) FROM charactercreator_necromancer) AS necromancer, +(SELECT COUNT(*) FROM charactercreator_thief) AS theif +""" +ITEM_COUNT = """ + SELECT COUNT(*) + FROM armory_item +""" +WEP_COUNT = """ +SELECT COUNT(*) name +FROM armory_item +INNER JOIN armory_weapon +ON armory_item.item_id = armory_weapon.item_ptr_id +""" +ITEMS_NO_WEPS = """ +SELECT( +SELECT COUNT(*) +FROM armory_item +) - +(SELECT COUNT(*) +FROM armory_weapon +) +""" +CHAR_ITEM_COUNT = """ +SELECT character_id, COUNT(*) +FROM charactercreator_character_inventory +GROUP BY item_id LIMIT 20; +""" +CHAR_WEP_COUNT = """ +SELECT charactercreator_character_inventory.character_id, COUNT(*) +FROM charactercreator_character_inventory +INNER JOIN armory_weapon ON charactercreator_character_inventory.item_id = armory_weapon.item_ptr_id +GROUP BY charactercreator_character_inventory.character_id LIMIT 20 +""" +AVG_WEAPONS = """ +SELECT AVG(num_weapons) +FROM +( +SELECT charactercreator_character_inventory.character_id, COUNT(*) AS num_weapons +FROM charactercreator_character_inventory +INNER JOIN armory_weapon ON charactercreator_character_inventory.item_id = armory_weapon.item_ptr_id +GROUP BY charactercreator_character_inventory.character_id +) +""" +AVG_ITEMS = """ +SELECT AVG(num_items) +FROM +( +SELECT charactercreator_character_inventory.character_id, COUNT(*) AS num_items +FROM charactercreator_character_inventory +INNER JOIN armory_item ON charactercreator_character_inventory.item_id = armory_item.item_id +GROUP BY charactercreator_character_inventory.character_id +) +""" + +if __name__ == "__main__": + conn = connect_to_db() + curs = conn.cursor() + char_count = execute_query(curs, CHARACTER_COUNT) + results = execute_query(curs, GET_CHARACTERS) + class_count = execute_query(curs, CLASS_COUNT) + item_count = execute_query(curs, ITEM_COUNT) + wep_count = execute_query(curs, WEP_COUNT) + items_no_weps = execute_query(curs, ITEMS_NO_WEPS) + char_item_count = execute_query(curs, CHAR_ITEM_COUNT) + char_wep_count = execute_query(curs, CHAR_WEP_COUNT) + avg_items = execute_query(curs, AVG_ITEMS) + avg_weapons = execute_query(curs, AVG_WEAPONS) + print(results[0]) + print("Character Count:", char_count) + print("Class Count (cleric, fighter, mage, necromancer, theif):", class_count) + print("Item Count", item_count) + print("Weapon Count:", wep_count) + print("Items without Weapons:", items_no_weps) + print("Items per character ID:", char_item_count) + print("Weapons per character ID:", char_wep_count) + print("Average Number of Items Per Character:", avg_items) + print("Average Number of Weapons Per Character:", avg_weapons) + + + diff --git a/module1-introduction-to-sql/this_doesnt_exist.sqlite3 b/module1-introduction-to-sql/this_doesnt_exist.sqlite3 new file mode 100644 index 00000000..8a77dc58 Binary files /dev/null and b/module1-introduction-to-sql/this_doesnt_exist.sqlite3 differ diff --git a/module2-sql-for-analysis/titanic.sqlite3 b/module2-sql-for-analysis/titanic.sqlite3 new file mode 100644 index 00000000..fad57907 Binary files /dev/null and b/module2-sql-for-analysis/titanic.sqlite3 differ diff --git a/module2-sql-for-analysis/titanic_instert.py b/module2-sql-for-analysis/titanic_instert.py new file mode 100644 index 00000000..1239fb68 --- /dev/null +++ b/module2-sql-for-analysis/titanic_instert.py @@ -0,0 +1,65 @@ +import psycopg2 +import sqlite3 + +pg_dbname = "vkrdlbel" +pg_user= "vkrdlbel" +pg_password = "luzKEtW5bsdypA4gwGerKxZ76WI00PyU" +pg_host = "topsy.db.elephantsql.com" + +pg_conn = psycopg2.connect(dbname = pg_dbname, + user = pg_user, + password = pg_password, + host = pg_host) + +pg_curs = pg_conn.cursor() + +pg_curs.close() + +sl_conn = sqlite3.connect('titanic.sqlite3') +sl_curs = sl_conn.cursor() + +get_passengers = "SELECT * FROM titanic;" +sl_curs.execute(get_passengers) +passengers = sl_curs.fetchall() +sl_curs.execute('PRAGMA table_info(titanic);') +sl_curs.fetchall() + +create_titanic_passengers_table = """ +CREATE TABLE titanic_passengers_3( + id SERIAL PRIMARY KEY, + Survived INT, + Pclass INT, + Name TEXT, + Sex TEXT, + Age INT, + Siblings_Spouses_Aboard INT, + Parents_Children_Aboard INT, + Fare INT +); +""" + +def refresh_connection_and_cursor(conn, curs): + curs.close() + conn.close() + pg_conn = psycopg2.connect(dbname = pg_dbname, user = pg_user, password = pg_password, host = pg_host) + pg_curs = pg_conn.cursor() + return pg_conn, pg_curs +pg_conn, pg_curs = refresh_connection_and_cursor(pg_conn, pg_curs) + + +# Commiting table to instance +pg_curs.execute(create_titanic_passengers_table) +pg_conn.commit() + + +# Inserting characters into empty table +for passenger in passengers: + insert_passenger = """ + INSERT INTO titanic_passengers_3 + (Survived, Pclass, Name, Sex, Age, Siblings_Spouses_Aboard, Parents_Children_Aboard, Fare) + VALUES """ + str(passenger).replace('"', "'") + ";" + pg_curs.execute(insert_passenger) + + +# Commiting to database +pg_conn.commit() diff --git a/module2-sql-for-analysis/titanic_table.py b/module2-sql-for-analysis/titanic_table.py new file mode 100644 index 00000000..3ac892d6 --- /dev/null +++ b/module2-sql-for-analysis/titanic_table.py @@ -0,0 +1,14 @@ +import psycopg2 +import sqlite3 +import pandas as pd + +conn = sqlite3.connect('titanic.sqlite3') +curs = conn.cursor() + +curs.execute('CREATE TABLE titanic (Survived, Pclass, Name, Sex, Age, Siblings_Spouses_Aboard, Parents_Children_Aboard, Fare)') +conn.commit() + +df = pd.read_csv('titanic.csv') +df['Name'] = df['Name'].apply(lambda x: x.replace("'", "''")) + +df.to_sql('titanic', conn, if_exists ='replace', index= False) \ No newline at end of file diff --git a/module3-nosql-and-document-oriented-databases/mongo.py b/module3-nosql-and-document-oriented-databases/mongo.py new file mode 100644 index 00000000..cdabb083 --- /dev/null +++ b/module3-nosql-and-document-oriented-databases/mongo.py @@ -0,0 +1,88 @@ +import pymongo +import sqlite3 + + + +password = "?" +dbname = "Cluster0" + +def create_connection(password, dbname): + client = pymongo.MongoClient( + "mongodb+srv://rgiuffre90:" + password +"@cluster0.h7zhk.mongodb.net/"+ dbname +"?retryWrites=true&w=majority" + ) + return client + + +def show_all(db): + all_docs = list(db.test.find()) + return all_docs + +# Sqlite3 transfer from rpg dataset + +conn = sqlite3.connect('rpg_db.sqlite3') +curs = conn.cursor() + +# Create query + +def new_query(query): + curs.execute(query) + return curs.fetchall() + +query = """ +SELECT * FROM charactercreator_character; +""" +results = new_query(query) + +# loop to extract data + +characters = [] + +for character in results: + doc = { + 'character_id' : character[0], + 'name' : character[1], + 'level' : character[2], + 'exp' : character[3], + 'hp' : character[4], + 'strength' : character[5], + 'intellegence' : character[6], + 'dexterity' : character[7], + 'wisdom' : character[8] + } + characters.append(doc) + +# Documents + +doc1= {'X': 1} + +blanknames_doc = { + 'food': 'salmon', + 'color': 'pink', + 'number': 3 +} + +blanknames_doc2 = { + 'food': 'lettuce', + 'color': 'gray', + 'number': 0 +} + +blanknames_doc3 = { + 'city': 'New York', + 'color': 'brown' +} + +all_docs = [blanknames_doc, blanknames_doc2, blanknames_doc3] + +if __name__ == "__main__": + client = create_connection(password, dbname) + db = client.test + db2 = client.rpgdata + db.test.insert_one(doc1) + db.test.insert_many(all_docs) + db2.rpgdata.insert_many(characters) + + print(db) + print(db.test.count_documents({'X': 1})) + print(db.test.find_one({'X': 1})) + print(show_all(db)) \ No newline at end of file