🎬 Analysis of Indian Movie Dataset using EDA, Classification, and Regression

A data science project focused on exploring, visualizing, and modeling an Indian movies dataset. This project applies Exploratory Data Analysis (EDA), classification, and regression techniques to uncover patterns and predict movie success metrics.

--

📂 Project Structure

├── data/                  # Dataset files (CSV)
├── eda/                   # EDA notebooks and visualizations
├── models/                # Classification & regression models
├── utils/                 # Utility functions
├── outputs/               # Graphs, plots, and predictions
├── requirements.txt       # Python dependencies
└── main.ipynb             # Main notebook (EDA + Modeling)

🧠 Key Features

✅ Data cleaning and preprocessing
📊 Exploratory Data Analysis (EDA) with Matplotlib & Seaborn
🤖 Machine Learning models:
- Classification (Decision Trees, Random Forests, etc.)
- Regression (Linear Regression, Random Forest Regressor, etc.)
📈 Evaluation metrics: Accuracy, MAE, RMSE, R²
🔍 Insightful visualizations and trend analysis

📌 Dataset Overview

The dataset contains information about Indian movies such as:

Title, Genre, Language
IMDb rating
Number of votes
Release date
Budget and box office performance

📁 Source of dataset (add link if public)

🚀 Getting Started

1. Clone the repo

git clone https://github.com/adityaranjan08/Analysis-of-Movie-Dataset-Using-Exploratory-Data-Analysis-Classification-and-Regression.git
cd Analysis-of-Movie-Dataset-Using-Exploratory-Data-Analysis-Classification-and-Regression

2. Install dependencies

pip install -r requirements.txt

3. Run the notebook

Use Jupyter Notebook or Jupyter Lab:

jupyter notebook main.ipynb

📈 Results & Insights

Some interesting findings:

Certain genres and languages are more likely to succeed.
IMDb ratings are strongly correlated with vote counts.
Regression models can reasonably predict movie popularity.

Detailed results and charts are available in the outputs/ folder.

🛠 Tools & Technologies

Python (Pandas, NumPy, Scikit-learn)
Seaborn & Matplotlib for Visualization
Jupyter Notebook
Git & GitHub

🤝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Aditya Ranjan
📧 Email
🌐 GitHub

⭐️ If you found this repo helpful, feel free to give it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
output		output
12417503 PythonProjectReport.pdf		12417503 PythonProjectReport.pdf
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 Analysis of Indian Movie Dataset using EDA, Classification, and Regression

📂 Project Structure

🧠 Key Features

📌 Dataset Overview

🚀 Getting Started

1. Clone the repo

2. Install dependencies

3. Run the notebook

📈 Results & Insights

🛠 Tools & Technologies

🤝 Contributing

📄 License

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

adityaranjan08/Analysis-of-Movie-Dataset-Using-Exploratory-Data-Analysis-Classification-and-Regression

Folders and files

Latest commit

History

Repository files navigation

🎬 Analysis of Indian Movie Dataset using EDA, Classification, and Regression

📂 Project Structure

🧠 Key Features

📌 Dataset Overview

🚀 Getting Started

1. Clone the repo

2. Install dependencies

3. Run the notebook

📈 Results & Insights

🛠 Tools & Technologies

🤝 Contributing

📄 License

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages