This project focuses on classifying pulsar stars using the Support Vector Machine (SVM) algorithm, a powerful method in the realm of supervised learning. The goal is to automate the identification process of pulsar stars from candidates collected during surveys, based on predictive modeling.
Datasets: Holds the processed and raw datasets.Processed_data: Contains processed data ready for analysis.Raw_data: Contains raw data files.
v_pred_test: Stores predicted outcomes on test data.notebooks: Jupyter notebooks for Exploratory Data Analysis (EDA) and model training.venv: A virtual environment directory for project dependencies..gitignore: Specifies untracked files to ignore.README.md: Provides an overview of the project.requirements.txt: Lists all the necessary Python packages.
To run this project, follow these steps:
-
Make sure Python 3.8 or later is installed on your machine.
-
Clone the repository to your local environment.
-
Navigate to the project's root directory and set up a Python virtual environment:
python -m venv venv
-
Activate the virtual environment:
On Windows:
.\venv\Scripts\activate
On macOS and Linux:
source venv/bin/activate -
Install the required dependencies:
pip install -r requirements.txt
To perform EDA or train the SVM model, open the Jupyter notebooks located in the notebooks directory:
EDA_Test_Data.ipynb: For exploratory data analysis on test data.EDA_Train_Data.ipynb: For exploratory data analysis on training data.MODEL_TRAINING.ipynb: For training the SVM model.
Run the notebooks sequentially to explore the data and train the model.
The Datasets directory is organized as follows:
Processed_data: Processed files likepulsar_data_test_processed.csvfor use in modeling.Raw_data: The original, unprocessed data files.
Predictions from the test data are saved in v_pred_test with filenames indicating they are predictions, such as Pulsar_data_test_Predicted.csv.
If you'd like to contribute, please fork the repository and create a pull request with your features or changes.
Open-sourced software licensed under the MIT license.