How to Perform Exploratory Data Analysis on B2B ECommerce Data
One of the greatest advantages of performing exploratory data analysis on B2B Ecommerce data is that it allows businesses to discover interesting transactional patterns of different customers and their demographics, firmographics, and other interesting insights.
Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize the main characteristics of data by often using statistical graphs and other visualization methods such as by the use of statistical graphs.
Exploratory Data Analysis (EDA) benefits B2B eCommerce businesses in the following ways:
- It allows business stakeholders to ask the right questions by exploring and visualizing data and allowing them to validate their business assumptions through constant investigation
- It allows business stakeholders to spot any potential data anomalies that emerge by feeding wrong data to a machine learning model
- Businesses can interpret the model output and test its assumptions
How to Use EDA to Discover Interesting Patterns in Business Data
Let’s begin with a data source, and one can refer to the notebook for a more detailed understanding of the technical intricacies below the surface.
The process of analyzing data primarily involves the following five approaches, to begin with:
- Context of Data
- Data Cleaning or Data Preprocessing
- Exploratory Data Analysis
- Deriving Results
So let’s delve in:
1. Context of Data
The sample E-Commerce dataset for this post has been obtained from Kaggle. Before dealing with the dataset marketers should try to understand how they can get a better understanding of the context of data.
This dataset to begin with consists of transactional data and infers to customers from different countries who purchase from an online retail store based in the United Kingdom (UK) that sells gifts for all the occasions. The information is summarized below:
- Company – UK-based and registered non-store online retail
- Products for selling – Gifts for all occasions
- Customers – Most are wholesalers (local or international)
- The overall transaction period was one year
2. Data Cleaning or Data Pre-processing
Data cleansing is the most important part of data analysis. The real-world data is messy and one has to take efforts for the ETL (extraction, transformation, and loading) of data. Below is the snapshot of the original data:
The following variables have been used in the dataframe and here’s what they mean:
InvoiceNo (invoice_num): A number assigned to each transaction
StockCode (stock_code): Product code
Description (description): Product name
Quantity (quantity): Number of products purchased for each transaction
InvoiceDate (invoice_date): Timestamp for each transaction
UnitPrice (unit_price): Product price per unit
CustomerID (cust_id): Unique identifier each customer
Country (country): Country name
Note – The product price per unit is assumed to follow the same currency throughout the analysis process.
A Protocol to Check the Missing Values
So far, so good. As some missing values exist for Customers ID and Description the rows with any of these missing values, therefore, must be removed.
By understanding the data in a more descriptive manner, we notice two things:
- Quality has negative values
- The Unit Price has zero values (Free items?)
Quantity with negative values is removed. In order to explain what negative values mean – and Unit Prices with zero values are explained in the latter part.
To calculate the total money spent on each purchase, we simply multiply Quantity with Unit Price:
Finally, we add a few columns that consist of the Year_Month, Month, Day, and Hour for each transaction for analysis later. The final draft looks like this:
3. Exploratory Data Analysis
In B2B E-Commerce everyone wants to know which customers come from which locations and which are the ones that place the maximum orders and spend the most money as they drive the sales of companies.
From the results, it can be inferred that most orders are made in the UK and customers from the Netherlands spend the highest amount of money on their purchases.
Extrapolative Insights into the Number of Orders Being Placed Each Month
It is evident that the company receives the highest number of orders in November 2011 since the entire data for the month of December 2011 is missing.
Gaining Insights into the Number of Orders Being Placed Per Day
Surprisingly, no transactions happen on Saturday through the entire period of one year. There could be many probable reasons for this, one of which could be the shops are closed on weekends for any transaction to happen.
There’s a trend, however, in the dataset wherein the number of orders received by the company tends to increase from Monday to Thursday and decreases afterward.
Gaining Insights into the Number of Orders per Hour
In terms of hours, there are no transactions after 8:00 pm until the next day at 6:00 am.
Besides, one can notice that the company receives the highest number of orders at 12:00 pm. One of the reasons could be that most customers make purchases during lunch hours between 12:00 pm – 2:00 pm.
Discovering Transactional Patterns for Unit Price
Before we move our attention to the zero values (FREE items) of unit price, we make a boxplot to check the distribution of the unit price for all products.
It is observed that as much as 75% of the data has a unit price of less than $3.75 dollars – indicating that most of the products are relatively cheap. Only a minority of them have high prices per unit (We proceed with the assumption that each price per unit follows the same currency).
Frequency of Giving Out Free Items for Purchase for Different Months
The plot above infers that the company occasionally gives out free items for sale every month except for in the month of June 2011. However, it is tough to clarify what factors contribute to giving out FREE items to particular customers.
Discovering Transactional Patterns for Each Country
Top 5 Countries with Maximum Number of Number of Orders
As expected, the company receives the highest number of orders in the UK (since it is UK based company).
For a better trend dissection and discernment, the UK is removed for a clearer comparison among other countries. Consequently, the top 5 countries (including the UK) that place the highest number of orders are:
- United Kingdom
- Ireland (EIRE), and
The Top Five Countries That Spent the Most
As the company received the highest number of orders from customers outside of UK, it is normal to see that customers in UK spend the most on their purchases.
UK is removed for clearer comparison among other countries. The top five countries that spent the maximum amount of money on purchases were:
- United Kingdom
- Ireland (EIRE)
- Germany, and
Inferences Drawn from EDA
- The customer with the highest number of orders comes from the United Kingdom (UK)
- The customer with the maximum amount of money spent on purchases belongs to the Netherlands
- The company, being a UK-based company receives the highest number of orders from UK. The Top 5 countries (including UK) that place the highest number of orders are the United Kingdom, Germany, France, Ireland (EIRE), and Spain.
- Overall, the customers in UK spend the most on purchases; the top five countries being: United Kingdom, Netherlands, Ireland (EIRE), Germany, and France
- The month that recorded the highest number of sales was that of November. The month with the lowest sales is undetermined as the dataset consists of transactions until 9th December of the year
- There are no transactions on Saturday between 1st Dec 2010 and 9th December
- The number of orders received by the company increases from Monday to Thursday and decreases thereafter
- The company receives the highest number of orders at 12:00 pm. Most of the customers made purchases during lunch hour between 12:00 pm – 2:00 pm
- The company tends to give out Free items for purchases occasionally each month (except June). However, it is unclear what factors contribute to giving out the Free items to particular customers.
Benefits of Performing Exploratory Data Analysis on B2B Ecommerce Business Data
As testified in the instances above, EDA is the process of visualizing and analyzing data to extract insights from it. EDA summarizes important characteristics of data in order to gain a better understanding of the dataset. With the help of EDA the marketers can:
- Maximize their insights into a data set
- Uncover the underlying structure of data and connect the dots to tell better data-driven stories and scale up their business endeavors
- Extract the important variables and can exclude or normalize the missing values once the outliers and anomalies are detected
- This method also allows marketers to test several hypotheses in the sales trend for better trend mining and to establish a formula for success
- Marketers can develop parsimonious models and determine optimal factor settings.
Performing EDA on the B2B Ecommerce businesses allows brands to identify some interesting results from the business data subjected to analysis. This analysis can be used to validate the business assumptions and establish the hypotheses. EDA also helps in the better interpretation of a machine learning model’s output. EDA works the best when the data-driven insights are blended with business understanding and real-time scenarios for a better interpretation of the story that the data tells.
Valasys Media is a pioneer B2B Media Publishing company. Our tailored B2B marketing & advertising solutions are breaking the barriers and helping our B2B clients make better and more refined business decisions. Our data-driven services viz. Lead Management, Data Solutions, Sales Pipeline Management, and Business Intelligence Solutions help customers in maxing their ways to optimum revenue and help them engineer perennially healthy sales pipelines.
Contact us to boost your marketing & advertising campaigns with data-driven insights.