Skip to content

Blastguy/SpoilerAlert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Don’t Spoil on Me

Your new favorite site to thwart spoilers!

Inspiration

When brainstorming a project idea for HackGT on Friday evening, the one thing we knew is that we wanted to do an application that utilized some form of machine learning and/or artificial intelligence. This is because a few of us are currently taking courses in this field and wanted to apply our skills in a practical environment. HackGT was a great opportunity to explore! We ended up choosing to create a Spoiler Alerter app because it was a quality-of-life problem we all had faced previously and was one of our more practical ideas (considering the Hackathon is max 36 hours).

What it does

Our program takes user input in a textbox, runs it through our artificial intelligence model, and determines whether the text contains a spoiler or not. Ideally, the user will be pasting medium to long length text so that at first glance they do not read a spoiler. This is important because the user is in contact with the text for at least the amount of time it takes to Ctrl+a → Ctrl+c → Ctrl+v, so the longer the text (e.g. movie reviews), the less chance they will read a spoiler.

How we built it

First we needed a dataset of which reviews contained spoilers and which did not in order to train our AI model. Initially we were going to create our own dataset by manually parsing through movie reviews and marking each accordingly, but we found a wonderful website called Kaggle that contains an imdb spoiler dataset. The URL to this dataset is https://www.kaggle.com/rmisra/imdb-spoiler-dataset. For the actual machine learning library, we utilized Scikit-Learn (originally developed by David Cournapeau) not only because it provided the algorithms we needed but also because it conforms with the free and open-source aspect that is the essence of Hackathons. We chose Random Forest Classifier as the specific algorithm for our model. Lastly, for the web hosting part of this project we registered a domain with Domain.com, utilized Google Cloud’s App Engine to host the application, and used the Flask (Python) micro web framework to handle the HTTP requests and run input through our model accordingly. In addition, we programmed HTML pages with a simple website design and a text box to take input.

Challenges we ran into

A big challenge we ran into was choosing the correct machine learning model to use. Initially, we tried the Support Vector Algorithm but our accuracy rates were not very high. We were getting higher 40% to mid 50% each trial. We decided to switch to the Random Forest Classifier because of this low-accuracy. In turn, we consistently achieved rates of 74% and up after making the change. We also faced a problem where preprocessing of the data took longer than expected due to the sheer size of the dataset and the limited resources we had. We could only make a computing instance with so many cores/RAM on Google Cloud and not burn through our credits in an hour. To combat this setback, we reduced the size of our dataset. This definitely hurt our accuracy rating, but there was not much we could do. If this was an app that we develop outside of a Hackathon, we would have much more time to let the computer preprocess the data. The third challenge we faced was integrating the model into our web application. Initially it seemed very easy, but Flask turned out to not agree with the methods created to run the text through the model. To fix this issue, we simply created another function in the ML python file that simplifies the process by calling the necessary functions, processes the data, and returns the appropriate output.

Accomplishments that we’re proud of

The biggest accomplishment we are proud of is the accuracy. Given that we only had 36 hours for this Hackathon, and quite frankly closer to 24 hours considering Friday night was brainstorming/planning and Sunday early morning was tying loose ends, creating the demo, and submitting the project, we know we did an amazing job with creating an ML application with 74% accuracy. In addition, this was most of the team members’ first hackathon, so venturing into the unknown was a new but exciting adventure! Another accomplishment we’re proud of is how much we learned in the last 36 hours!

What we learned

Everyone learned something new this weekend! The team members working on the AI and ML side obviously got hands-on experience with Python machine learning libraries and utilizing datasets to produce a model. The frontend developer learned more about coding static web pages in HTML and injecting CSS for styling purposes. They also explored the Bootstrap framework for easier implementation of CSS and JS with templates. The backend developer had never used Flask before so this project taught him an incredible amount about a very useful web framework. In addition, they also got to experience the joys and sorrows of frontend web page development.

What's next for Don't Spoil on Me

The main expansion idea for Don’t Spoil on Me we thought of was to transform it from a webpage application to a browser extension. As a browser extension, it can scan the text on pages as soon as one visits them and provide a little alert window that a spoiler may be present. This would be beneficial as the user could forego the small chance they accidentally read a spoiler during a copy+paste of the text into the web page text box. The only problem with a browser extension is that we would have to create and maintain multiple forks of this project because there are multiple popular browsers in the market that are all functioning differently. At the minimum, we would have to create extensions for Chrome, Safari, and FireFox. Another improvement that could be made is the AI model. With a bigger dataset and more time for preprocessing, we are confident that our accuracy levels can go up into the high 80s and even mid 90s percentages.

About

HackGT 2020 Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •