Protected: Predicting Sentiment Based on Drug Product User Reviews (Using Real World Data) for Informed Decision Making

Protected: Predicting Sentiment Based on Drug Product User Reviews (Using Real World Data) for Informed Decision Making – Part 2

Sentiment Analysis Using a Supervised Binary Text Classifier (Utilizing Large Language Models – BERT and XLNet)

Key Techniques to Deliver Our Project:

Defining a function to pre-process a given text (drug user reviews), after applying several pre-processing steps
Creating sentiment features from rating (1 and 0/ positive and negative)
n-gram tokenisation to consider word collocations
Word clouds for positive and negative reviews
Baseline model – for our use case, we used Neural Bag of Words (BoW) and employed a logistic regression classifier with n-grams features, as a baseline
Classification approach – Supervised binary classification: Large Language Models utilised (BERT and XLNet) and incorporating training and fine tuning. This is then compared against our baseline model
Evaluation using F1 score, as data was imbalanced
Through implementation of several models on the same dataset with high dimensional features, we come to the conclusion that the XLNet LLM performs the best, as both feature extractor and classifier with a score of 0.93
Overall, we met our objective. Which was to review sentiment based on these drug user reviews text, using a supervised binary text classifier, which classified the user reviews as positive or negative. By analysing the sentiment expressed in online drug reviews, healthcare providers and manufacturers can gain a more comprehensive understanding of the strengths and weaknesses of their products. This information can inform product development and improvement efforts, and help to ensure that products meet the needs and expectations of patients and consumers.

Discover the code behind the insights – check out our GitHub repository for this Natural Language Processing project