yelp-is-all-you-need
Big Data Mining and Management Projects
Leveraged a subset of the Yelp dataset to devise Deep Learning Algorithms for Sentiment Classification, Link Prediction and to build a recommender system.
Project 1: Sentiment Classification
This project aims to solve the multiclass sentiment classification problem on a subset of the Yelp dataset. The data consists of reviews as well as attributes such as ‘funny’, ‘cool’ and useful and the stars for each review that range from 1-5 which serve as the labels. Two different approaches for models are compared - 1) Heavyweight feature engineering-based ensemble model and 2) Contextualized word representation (BERT) based model.
-
Ensemble model motivation -
BERT model motivation
Project 2: Link Prediction
This project aims to solve the link prediction problem on a subset of the Yelp dataset. The data consists of user_id and friends which correspond to a directed graph, whose edges serve as the labels. Two different random walk based embedding algorithms are compared - 1) DeepWalk and 2) node2vec.
-
DeepWalk model motivation -
Node2Vec model motivation
Project 3: Recommender System
This project aims to solve the rating prediction problem on a subset of the Yelp dataset. The data consists of users and businesses, with a rating that corresponds to the rating a user has given the respective business. In addition, various individual user attributes as well as business attributes are available to supplement these ratings. The Wide and Deep Model (WDL) has been used since it performed better than the Neural Collaborative Filtering Model (NCF) during our analysis, with a validation RMSE (Root Mean Squared Error) of 0.9996 from the former compared to RMSE of 1.0533 from the latter model.
-
NCF model motivation -
WDL model motivation
Usage
All notebooks can be downloaded and run on Google Colab
Acknowledgements
Grateful to the Big Data Mining and Management faculty