Sales Analysis of a Supermarket using pandas

Olakunle Yusuf
4 min readMar 30, 2022
image from Pixabay

Introduction

Supermarkets help us get our daily household goods, groceries and gift items. In 2019, the global retail market generated sales of nearly 25 trillion U.S. dollars, with a forecast to reach close to 27 trillion U.S. dollars by 2022 — Statista 2022.

Business Task

In the dataset provided, key business questions will be answered. These questions and their answers will drive actionable insights.

What products have brought in the most revenue?

What category of products is the most profitable?

What is the peak month for sales?

Is there a yearly growth for the business?

Which customer is the most valuable? Most used payment channel?

Data Source

The dataset for this analysis was gotten from Github. The data was stored in a csv format, it is structured, organized in rows and columns.

Data Cleaning and Manipulation

Python is the tool I have chosen to use for this project. Pandas libraries provide efficient cleaning tools and visualizations in order to gain quick insights.

The dataset was gotten from Kaggle. This csv file contains 51290 rows and 21 columns. The columns are order_id, order_date, ship_date, ship_mode, customer_name, segment, state, country, market, region, product_id,
category, sub_category, product_name, sales, quantity,
discount, profit, shipping_cost, order_priority, year. There are no null values.

Each year has December has the month with most sales made (with the exception of 2014 but my guess is that the total sales info hasn’t been gotten since December 2014 is the last month in the dataset). The ‘ember’ months generally have huge volume of sales. This should mainly be due to the holiday season.

Monthly sales from 2011–2014

The categories with most sales is Technology with 4.7 million dollars sales made, the next category with most sales is Furniture. Office Supplies have the least amount of sales with 3.8 million dollars.

Interestingly, the category with most profits is as well the Technology category with 663,778 dollars profit made. The Furniture department has the second most sales as shown above but it is the least profitable category with 286,782 dollars profit. Office supplies are the second most profitable category with a whooping 518,472 dollars profit made. This is largely due to the share total number of transactions for Office supplies.

The chart below shows the customers with most transactions with the supermarket. This however can be tricky because multiple customers can share the same first name and surname. This can be dealt with if in subsequent years, each customer is given a unique ID.

Having most transactions does not necessarily translate to having most sales. The customers who have spent the most dollars are shown below.

As can be seen above, only Bart Watters is in the top 10 most transactions as well as top 10 most sales.

There is a yearly growth in the number of transactions as well as number of sales.

Yearly Transactions and Yearly Sales

CONCLUSIONS/RECOMMENDATIONS

It was observed that most sales take place during the holiday season with December being the peak month for sales. Also, there is a sales spike during the summer sales in June.

The Technology category brings most sales. The top 4 products with most sales are in the Technology category. Most profits also come from the Technology category.

There is an increase in the number of transactions yearly. There is also consequently an increase in the total sales recorded yearly.

--

--

Olakunle Yusuf

I am a data analyst with strong analytical skills. I recently earned Google Data Analytics Professional Certificate. SQL | R | Python | Tableau