Natural Language Processing - Sentiment Analysis

We investigated how different Natural Language Processing (NLP) techniques could be used to perform sentiment analysis on real user generated text data from the Sentiment140 dataset [1]. First we investigated an LSTM model before deciding on using the self-attention network code from [2] because of the possible speed and accuracy advantages. Our contributions included investigating how the training batch size and dropout rate affected the accuracy of the model and validating an existing model by reproducing it and using it with a different dataset. After tuning the model with a smaller version of the dataset we trained it on 160,000 tweets. When we tested our model on the test dataset, we achieved an accuracy around 80%.

The code and final report are available on Github:

Key Image

Comparison of test accuracy by dropout rat

Erick Jones
Erick Jones
PhD Candidate

Erick Jones is a Ph.D. candidate in Operations Research and Industrial Engineering who develops multi-systems optimization models to analyze how energy systems, water resources, supply chains, urban space, and transportation networks operate in concert to influence economic and environmental well-being.