Multi label classification with logistic regression
This is a prototype ML project consisting of scraped data from a big Dutch webshop. The labels are categoryName and title and subTitle are features.
Summary
Scrapy SitemapSpiderto scrape items iteratively.- Preprocess the data by removing
stopwordsand use aSnowballStemmerwhich transforms all the different forms of a word into a single word - Pipeline with
LabelPowersetandLogisticRegression. This transforms the multi-classification problem into a multi-class problem. The classifier is trained on all the unique label combinations in the training dataset.