Amazon Review Analysis
- Last Updated
- Last Updated -
Analyzing customer reviews allows brands to identify potential areas of improvement, gauge overall customer satisfaction, communicate brand messaging, identify products with the most reviews, and much more. However, as the volume and variety of review data continue to grow, effective analysis becomes more challenging.
When done right, a customer feedback analytics dashboard for your brand will look something like this:
In this article, we will outline the current state of the art methods to analyze reviews and also look at AI text mining solutions for customer feedback analytics.
Review Analysis Process
Broadly, we can break down the review analsysi process into 3 steps - Data Gathering, Analysis and Reporting.
Step 1: Data Gathering
This involves extracting data from e-commerce review sites like Amazon, BestBuy, Walmart, Bed Bath & Beyond etc.. This can be tricky because:
- Since each site displays reviews differently, a custom parser will be required for each site.
- Difficulty of crawling some sites (like Amazon) is very high.
- Custom parsers will need to be kept up to date with changes on the marketplaces.
Step 2: Review Analysis
In order to detect patterns and actionable insights from a large volume of review data, we need derive structure out of the unstructured text.
- Topic Detection or Review Aspect Classification is the process to understanding what is the topic that is spoken about. This can be very tricky to get right. Consider the example of the word “light” - depending on context the meaning of the word completely changes. For eg. “the headphones are very light”, “the bruner is easy to light”, “screen is not visible under the light”
- Sentiment Detection needs to take the context of the category into account. For eg. the word “loud” can be a positive thing for a speaker or a headphone, but negative for an appliance like a washing machine.
- Likewise, we also need to take the context of the review into account - ie. the word “cheap” is positive in “i got it for cheap” and negative in “it is cheaply made”
Step 3: Reporting
Though it might seem straightforward, to do justice in report generation for review data can be quite challenging:
- Needs to be simple, easy to understand - but comprehensive
- Depth and Breadth - Needs to be broad enough to cover the whole category, but should also have the capability to drill down into the root cause for any event.
- Many top companies already have adopted best practices - gaining insight into their processes can jump-start the report generation process.
Step 1: Data Gathering
Extracting this data review would be a time consuming process, and requires constant maintenance.
Customers speak freely about their product experiences in the places that they buy the product from. Undoubtedly, the biggest online retailer today is Amazon hence Amazon Reviews play a crucial role in customer sentiment. In addition to Amazon review data, other marketplaces like BestBuy, Target, HomeDepot, Bed Bath & Beyond, etc. may also contain large volumes of reviews depending on the category.
In this step, we understand the process of gathering the data from the various marketplace sources. Following are the tools commonly used for scraping from the web:
- Selenium - Browser automation based scraping
- LXML - Library for HTML and XML parsing
- Beautiful Soup - More modern XML/HTML library
- Selenium Python library - Works in conjunction with the Selenium standalone tool
- Nokogiri - HTML/DOM parsing
The right tool set required to scrape from a website could be different based on the tech used on the website.
Step 2: Review and Rating Analysis
In context of review analysis, the biggest problems to solve are Aspect Classification (identifying the topic being spoken about) and Sentiment Classification (inferring whether the user is complaining or praising).
In order to understand the text, Natural Language Processing (NLP) is used. NLP is a field that deals with gathering information from unstructured text. As a field has experienced a renaissance in the past few years with the application of Neural Networks and Transfer Learning.
Neural Networks is a field of Machine Learning that gathers inspiration from the inner working of the brain to solve problems. Transfer Learning is a sub-field that specializes in leveraging learning from one domain to solve problems in another.
In Neural Networks, Convolutional Neural Networks, LSTMs and more recently Transformers are some of the techniques that are state of the art and can be used to solve many problems like sentiment analysis and topic detection. Most neural networks these days either use PyTorch or Tensorflow.
The most popular modern open source libraries used are:
The machine learning process usually looks like the following:
Typical Machine Learning process
Using an off the shelf SaaS solution for these usually offloads these steps and saves huge development costs.
Step 3: Report Generation
First step here is to figure out the right view and visualizations required to get the most out of the data. This is often the most challenging part, and is often overlooked.
The data is then transferred to a visualization platform like Tableau, Qlik, ChartIO or Google Data Studio.
Instead of building a platform like this in-house with data scientists, engineers and data analysts, a solution like FreeText AI might be the right solution for brands.
FreeText AI is a turnkey solution for brands to gather all their feedback in one central location and make sense of it. Years have been invested in creating the right toolset, so that you don’t have to.
With FreeText AI, brands do not have to worry about data gathering, analysis and report generation. Instead, all the data is exposed through an easy to use dead-simple dashboard.
In order to add a source in FreeText AI dashboard, you just have to provide the URLs of the products.
Add products by URL
Alternatively, the entire category can be automatically added instead of adding individual product SKUs.
FreeText AI is designed with scale in mind, so entire categories of products with hundreds of thousands of reviews can be analyzed.
Analysis comprises of review sentiment analysis and topic detection using automated machine learning models.
Once, the products are added and processed (takes a few minutes depending on the number of products) the data is ready to be explored!
Let’s take a look at the overview dashboard for an individual product
Product view in FreeText AI Dashboard
Goal of the product overview dashboard is to make all the high-level and root-cause analysis available in one go. A quick glance at the dashboard and the following questions can be answered:
- How many reviews came in for the given time period?
- What was the rating breakdown (both star rating and review rating)?
Note that in many cases there can be a difference between the star rating and the review rating. Review ratings tend to be more detailed and thought through, and hence in many cases they would be a lower score.
- What is the overall sentiment trend?
- What are the topics that people are talking about
- What is the breakdown of the top positive and top negative topics that have been spoken about
We can also realign our focus and create this report for only negative reviews, or positive reviews or focus on a particular time period.
Zooming out, there could be trends on a product group - say a brand or a sub-category level - that could reveal interesting consumer patterns as well. Trends that might not be obvious on a product level could reveal themselves while looking at the category as a whole.
Product Group Report
Another way to look at this is to zoom in and understand what exactly users are saying. This is where the Explore view comes in - this view focuses on exploring just what the users are saying in the text.
Topics Explore Table
As you can see, all the topics and sub-topics mentioned are categorized. The sentiment and volume bars at a glance tell what the trend has been and what the positive/negative split of the mentions have been.
Clicking on a topic opens inline all the mentions, with highlights indicated why the AI thought a particular text was classified into the topic. This also allows us to read verbatim of the text and actually read the contents as the user has exactly stated.
Explore Mentions and Highlights
In addition to this, searching for specific keywords or terms is also enabled and so is comparison between product groups, with the same goodness of exploration of volume trends and sentiment split reports.
Let’s now take a look at some of the challenges of doing review analysis.
Data gathering challenges
The first step to do review analysis is to gather the requried data. Now, the requried data may be split in multiple places. For eg. a brand could have some Amazon reviews, and reviews in their listings on BestBuy, Target and their own D2C website. Typical challenges involve gathering reviews at scale, dealing with technical blockades etc. Another major challenge is to keep up with the changes made to the marketplace website, which can be very frequent at times. Another challenge is to ensure that the data is complete and converted to a standardized format - which is trickier than it sounds.
Text Analytics challenges
The next challenge is around text analytics. Sentiment analysis might seem straight forward, except when you consider the edge cases. For example, is being “light” is a good thing or a bad thing? For eg. “The product was too light” - could be a good thing if it is say a laptop, and a bad thing if it is a paper-weight. Likewise, the same word “light” has many connotations - “light” as a noun as in “LED light” or say “light a fire”. These kind of ambiguous nuances are filled in natural languages and not handling them right can lead to disastrous outcomes.
Even after gathering the data, and analyzing the text the problem is not fully solved. Then comes the next step - reporting. Most people understand that reporting involves creating visualizations that make sense. But it doesn’t end there - perhaps the most important aspect of reporting that is often overlooked is Aggregation. Aggregation involve combining entities and units into groups that makes sense for reporting. This can be tricky, especially if you consider that the data we are dealing with can be Big Data - with hundreds of thousands of pieces of feedback. Another tricky part of reporting is understanding the impact of each insight from the feedback.
A tool like FreeText AI was explicitly built with these considerations in mind, can has use the core engineering expertise of their team to get around some of these challenges.
Understanding reviews is key to unlocking customer value. Modern review analysis tools leverage technologies like Neural Networks to enable mining insights from marketplace reviews.