This project is a machine learning-based web application for detecting cyberbullying in tweets. It uses natural language processing (NLP) and supervised learning to classify text as bullying or not. The app is built with Streamlit for an interactive user interface.
- Input text/tweet and get instant cyberbullying prediction
- Uses a trained machine learning model (e.g., Decision Tree, Random Forest, or similar)
- Text preprocessing with NLTK (stopwords, tokenization)
- TF-IDF vectorization for feature extraction
- Model accuracy and evaluation metrics
cyberbullying_tweets.csv: Contains labeled tweets for training and testing- Classes: Bullying, Not Bullying (binary classification)
- Data Preprocessing
- Remove stopwords
- Tokenize and clean text
- Convert text to lowercase
- Feature Extraction
- TF-IDF Vectorizer transforms text into numerical features
- Vectorizer is saved as
tfidf_vectorizer.pkl
- Model Training
- Model (e.g., DecisionTreeClassifier) is trained on the vectorized data
- Model is saved as
bullying_model.pkl - Model accuracy is evaluated and reported
- Prediction
- User input is preprocessed and vectorized
- Model predicts if the input is bullying or not
- Main file:
app.py - Loads the trained model and vectorizer
- Provides a simple UI for text input and displays prediction
- Can be run locally or deployed online
- Install dependencies:
pip install -r requirements.txt - Start the app:
streamlit run app.py
- The app will open in your browser.
- Push your project to GitHub
- Go to Streamlit Community Cloud
- Link your GitHub repo and select
app.pyas the main file - Deploy and share your app
app.py: Streamlit web appbullying_model.pkl: Trained ML modeltfidf_vectorizer.pkl: TF-IDF vectorizercyberbullying_tweets.csv: Datasetbullying-classification-accuracy-80.ipynb: Training notebookrequirements.txt: Python dependencies
- Python 3.8+
- scikit-learn
- nltk
- streamlit
MIT License
- [Rishav Shah]
Feel free to modify this README to add more details about your model, dataset, or deployment process.