Data Analysis of Google Play apps using Python
Apps are a part of our lives now more than ever. The average phone user is likely to check his/her phone 47 times in a day. Gen-Z users however check their phones more often, a whooping 86 times a day according to Deloitte’s 2017 Global Mobile Consumer Survey: U.S. edition, released in December 2017.
According to emarketer, 88% of the time spent on phones is spent on apps. What exactly are people doing on these apps? Which apps are the most popular? Which apps are free? How can a developer make more money from these apps? These questions and many more are what I plan to use data analysis to answer.
The dataset for this analysis was gotten from Kaggle. The data was stored in a csv format, it is structured, organized in rows and columns.
This dataset does not contain all apps on the Google play store but it is a web scraped data of 10k Play Store apps for analyzing the Android market.
Data Cleaning and Manipulation
Python is the tool I have chosen to use for this project. Pandas libraries provide efficient cleaning tools and visualizations in order to gain quick insights.
I downloaded the googlestore dataset from kaggle. This csv file contains 10,841 rows and 13 columns. The columns are App, Category, Rating, Reviews, Size, Installs, Type, Price, Content Rating, Genres, Last Updated, Current Ver, Android Ver.
A boxplot of the Rating shows that the Rating values are concentrated around 4.5 and there is an outlier at 19.0. We know the maximum rating possible is 5.0 hence this 19.0 value should be an error. This row is removed from the dataset.
A fresh plot shows the values concentrated around 4.0 to 4.5 as well as all values being between 0 and 5.
The dataset consists of null values. The Rating column has 1474 null values, the Type colum has 1 null value while the Current ver and Android ver have 8 and 2 null values respectively. In order to include these rows with null values in the analysis, suitable average values were chosen to fll the null values.
The Reviews, Installs and Price columns are currently formatted in non-numeric datatypes. These columns were formatted appropriately and converted to numeric datatypes.
The categories with most apps in the Google Playstore are FAMILY, GAME, TOOLS, MEDICAL and BUSINESS.
The Category of apps with most installs are GAME, COMMUNICATION, PRODUCTIVITY, SOCIAL and TOOLS.
The app category with the highest ratings in total are FAMILY, GAME and TOOLS. This chart does not say a lot, it is infact very similar to the chart for total number of apps in each category. The average ratings are closely distributed around 4.5 hence the above chart only accentuates the number of apps in each category stat.
A more insightful rating chart will be the average rating of apps in each category. The app category with the highest average rating per app are EVENTS, EDUCATION, ART_AND_DESIGN, BOOKS_AND_REFERENCE and PERSONALIZATION
Free apps make up 92.62% of the apps in googleplaystore while Paid apps account for 7.38%.
The apps that have made the most earning are in the category FAMILY, LIFESTYLE, GAME, FINANCE and PHOTOGRAPHY.
There is no noticeable correlation observed in the numeric fields such as price, reviews, installs, ratings and earnings in the dataset.
However, when only the paid apps were considered, some interesting correlations came up. It was observed that the number of installs have a strong correlation with the number of reviews. Also, the earnings have a strong correlation with the number of installs. The price of the app and the earnings do not have a strong correlation.
- The apps with most earnings are in the FAMILY, LIFESTYLE, GAME, FINANCE and PHOTOGRAPHY categories. A developer/entrepreneur who wants to invest can explore these genres.
- Apps that got the best ratings are in the EVENTS, EDUCATION, ART_AND_DESIGN, BOOKS_AND_REFERENCE and PERSONALIZATION categories.
- The majority of apps in the Google Play store are free.
- There is a high correlation between the number of installs and reviews for paid apps.
- There is a high correlation between the number of installs and earnings for paid apps.
- The number of reviews and earnings also show a strong correlation.