- Project Owner: @dark-teal-coder
- First Published Date: 2022-12-19
- Title: Python Data Analysis of Tech Gadget Sales with Pandas
- Difficulty:
- Beginner
- Intermediate
- Advanced
- Scale:
- Small
- Medium
- Large
This repository contains a Jupyter notebook which demonstrates how to analyze tech gadget sales in the US in 2019. We use the Python Pandas and Matplotlib libraries to analyze and answer business questions about 12 months worth of sales data here. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.
We walk through different Pandas & Matplotlib methods below.
- Concatenating multiple CSVs together to create a new DataFrame (
pd.concat()) - Adding columns
- Parsing cells as strings to make new columns (
.str) - Using the
apply()method - Using
groupby()to perform aggregate analysis - Plotting bar charts and lines graphs to visualize our results
- Labeling our graphs
- Python 3
- Python Package Installer/Manager
pip- If you installed Python from python.org, you should already have
pip. If it is not installed, you can use the commandpy -m ensurepip --default-pipto bootstrap it from the standard library. If you are using Linux, you will have to install the package manager separately. You can find out more about thepiptool here.
- If you installed Python from python.org, you should already have
- Text Editor and Integrated Development Environment (IDE)
- Command-line interface (CLI)
- You can install the open-source PowerShell on Windows, Linux and macOS if you do not have or want to use a pre-installed CLI on your local machine.
Check if you have Python installed using the command python --version, or simply, python version, in the CLI. Git-clone the project repository from Github to the local machine. Use the command py -m pip install package_name to install the necessary Python libraries. Check out pip documentation to learn more about pip install. Check the top part of the .py script file for the list of libraries required. For example, you may need requests and beautifulsoup4 libraries if you see the following lines in the top part of the script file:
import requests
from bs4 import BeautifulSoup
If pip fails to locate the relevant packages, you may find it at Python Package Index (PyPI). Use python file_name.py to run the script in a CLI. Or, use an IDE, such as VS Code, to run the script. There will usually be a [Run] button in the top right corner of the opened script file.
- Click [Code]
- Click [Download ZIP]
- Extract the .zip file to the working directory
To access all of the files, fork this repo and then clone it locally.
For more information, please refer to Fork a repo.
- Open a command-line interface
- Type
pip install pandas - Press [Enter]
For more information, please refer to Installing Pandas.
Prerequisite: Python1
- Run
pip3 install --upgrade pipto upgrade to the latest version ofpip - Run
pip3 install jupyterto install Jupyter Notebook
For more information, please refer to Installing the Classic Jupyter Notebook Interface.
- pandas.DataFrame.any documentation
- pandas.DataFrame.dropna documentation
- pandas.to_numeric documentation
- pandas.to_datetime documentation
- pandas.Series.dt.month documentation
- pandas.DataFrame.groupby documentation
- matplotlib.pyplot.plot documentation
- matplotlib.pyplot.grid documentation
- pandas.DataFrame.duplicated documentation
- pandas.DataFrame.transform documentation
- itertools.combinations documentation
- itertools.combinations() in Python
- collections.Counter documentation
- Python's Counter: The Pythonic Way to Count Objects
- Update Method Of Counter Class
- matplotlib.pyplot.subplots documentation
- matplotlib.axes.Axes.twinx documentation
1st Completion Date: Dec 20, 2022
Footnotes
-
Python is a requirement for installing the Jupyter Notebook. ↩