How to Perform Exploratory Data Analysis on B2B ECommerce Data
One of the greatest advantages of performing exploratory data analysis on B2B Ecommerce data is that it allows businesses to discover interesting transactional patterns of different customers and their demographics, firmographics, and other interesting insights.
Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize the main characteristics of data by often using statistical graphs and other visualization methods such as by the use of statistical graphs.
Exploratory Data Analysis (EDA) benefits B2B eCommerce businesses in the following ways:
Let’s begin with a data source, and one can refer to the notebook for a more detailed understanding of the technical intricacies below the surface.
The process of analyzing data primarily involves the following five approaches, to begin with:
So let’s delve in:
1. Context of Data
The sample E-Commerce dataset for this post has been obtained from Kaggle. Before dealing with the dataset marketers should try to understand how they can get a better understanding of the context of data.
This dataset to begin with consists of transactional data and infers to customers from different countries who purchase from an online retail store based in the United Kingdom (UK) that sells gifts for all the occasions. The information is summarized below:
2. Data Cleaning or Data Pre-processing
Data cleansing is the most important part of data analysis. The real-world data is messy and one has to take efforts for the ETL (extraction, transformation, and loading) of data. Below is the snapshot of the original data:
The following variables have been used in the dataframe and here’s what they mean:
InvoiceNo (invoice_num): A number assigned to each transaction
StockCode (stock_code): Product code
Description (description): Product name
Quantity (quantity): Number of products purchased for each transaction
InvoiceDate (invoice_date): Timestamp for each transaction
UnitPrice (unit_price): Product price per unit
CustomerID (cust_id): Unique identifier each customer
Country (country): Country name
Note – The product price per unit is assumed to follow the same currency throughout the analysis process.
A Protocol to Check the Missing Values
So far, so good. As some missing values exist for Customers ID and Description the rows with any of these missing values, therefore, must be removed.
By understanding the data in a more descriptive manner, we notice two things:
Quantity with negative values is removed. In order to explain what negative values mean – and Unit Prices with zero values are explained in the latter part.
To calculate the total money spent on each purchase, we simply multiply Quantity with Unit Price:
Finally, we add a few columns that consist of the Year_Month, Month, Day, and Hour for each transaction for analysis later. The final draft looks like this:
3. Exploratory Data Analysis
In B2B E-Commerce everyone wants to know which customers come from which locations and which are the ones that place the maximum orders and spend the most money as they drive the sales of companies.
From the results, it can be inferred that most orders are made in the UK and customers from the Netherlands spend the highest amount of money on their purchases.
It is evident that the company receives the highest number of orders in November 2011 since the entire data for the month of December 2011 is missing.
Surprisingly, no transactions happen on Saturday through the entire period of one year. There could be many probable reasons for this, one of which could be the shops are closed on weekends for any transaction to happen.
There’s a trend, however, in the dataset wherein the number of orders received by the company tends to increase from Monday to Thursday and decreases afterward.
In terms of hours, there are no transactions after 8:00 pm until the next day at 6:00 am.
Besides, one can notice that the company receives the highest number of orders at 12:00 pm. One of the reasons could be that most customers make purchases during lunch hours between 12:00 pm – 2:00 pm.
Before we move our attention to the zero values (FREE items) of unit price, we make a boxplot to check the distribution of the unit price for all products.
It is observed that as much as 75% of the data has a unit price of less than $3.75 dollars – indicating that most of the products are relatively cheap. Only a minority of them have high prices per unit (We proceed with the assumption that each price per unit follows the same currency).
The plot above infers that the company occasionally gives out free items for sale every month except for in the month of June 2011. However, it is tough to clarify what factors contribute to giving out FREE items to particular customers.
Top 5 Countries with Maximum Number of Number of Orders
As expected, the company receives the highest number of orders in the UK (since it is UK based company).
For a better trend dissection and discernment, the UK is removed for a clearer comparison among other countries. Consequently, the top 5 countries (including the UK) that place the highest number of orders are:
As the company received the highest number of orders from customers outside of UK, it is normal to see that customers in UK spend the most on their purchases.
UK is removed for clearer comparison among other countries. The top five countries that spent the maximum amount of money on purchases were:
As testified in the instances above, EDA is the process of visualizing and analyzing data to extract insights from it. EDA summarizes important characteristics of data in order to gain a better understanding of the dataset. With the help of EDA the marketers can:
Performing EDA on the B2B Ecommerce businesses allows brands to identify some interesting results from the business data subjected to analysis. This analysis can be used to validate the business assumptions and establish the hypotheses. EDA also helps in the better interpretation of a machine learning model’s output. EDA works the best when the data-driven insights are blended with business understanding and real-time scenarios for a better interpretation of the story that the data tells.
Valasys Media is a pioneer B2B Media Publishing company. Our tailored B2B marketing & advertising solutions are breaking the barriers and helping our B2B clients make better and more refined business decisions. Our data-driven services viz. Lead Management, Data Solutions, Sales Pipeline Management, and Business Intelligence Solutions help customers in maxing their ways to optimum revenue and help them engineer perennially healthy sales pipelines.
Contact us to boost your marketing & advertising campaigns with data-driven insights.