Profiling

TODO: Use chatgpt to generate content

Generate Reports Using Pandas Profiling,
Deploy Using Streamlit Kaustubh Gupta 7 Last Updated : 23 Oct, 2024 Pandas library offers a wide range of functions, making it an indispensable tool for data manipulation that caters to almost every task. One convenient feature, often employed for gaining quick insights into a dataset, is the pandas describe function. This function gives users a descriptive statistical summary of all the features, helping them understand the data’s overall characteristics. However, for a more comprehensive analysis, the pandas profiling Package is an additional valuable tool in the Pandas ecosystem. Pandas profiling is the solution to this problem. It offers report generation for the dataset with lots of features and customizations for the report generated. In this article, we will explore this library, look at all the features provided, and some advanced use cases and integrations that can be useful to create stunning reports out of the data frames! This article was published as a part of the Data Science Blogathon. Table of cWoen utsee ncotoskies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Accept all cookies Use necessary cookies Add MetaData Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Controlling parameters of the Report
Integrations Widget in Jupyter notebook
How to Make it a Part of Streamlit App? Step 1: Install the streamlit_pandas_profiling Step 2: Create a Python file
Conclusion Installation Like every other Python package, pandas profiling can be easily installed via the pip package manager: Copy Code pip install pandas-profiling It can also be installed via Conda package manager too: Copy Code conda env create -n pandas-profiling conda activate pandas-profiling conda install -c conda-forge pandas-profiling Dataset and Setup Now it’s time to see how to start the Python pandas profiling library and generate the We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & report out of the data frames. First things first, let’s import a dataset for which we will Cookies Policy. be generatinSgh powro dfieleta rilesport. I am using the agriculture dataset which contains the State_name, District_name, Crop_year, Season, Crop, Area, and Production. Copy Code import pandPaesrs oansa lipzedd GenAI Learning Path 2025✨ Crafted Just for YOU!
df = pd.read_csv("crops data.csv") Before I discuss the Python pandas profiling, have a look at the pandas describe function output for the dataframe: Copy Code df.describe(include='all') (Notice that I have used the include parameter of the describe function set to “all” which forces pandas to include all the data types of the dataset to be included in the summary. The string type values are accompanied by options such as unique, top, and frequency) Let’s import the Python pandas profiling library: We use cookies essential for this site to function well. Please click to help us improve its Copy Code from pandasu_sperfuolnfeislsi wnigth iadmdpiotiornta l Pcroookfieisl.e LReeaprno rabtout our use of cookies in our Privacy Policy & Cookies Policy. Show details To start profiling a dataframe, you have two ways:
You can call the ‘.profile_report()’ function on pandas dataframe. This function is Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
not part of the pandas API but as soon as you import the profiling library, it adds this function to dataframe objects. You can pass the dataframe object to the profiling function and then call the function object created to start the generation of the profile. You will get the same output report in either of the methods. I am using the second method to generate the report for the imported agriculture dataset. Copy Code profile = ProfileReport(df) profile Animation Showing report generation Sections of the Report Now that the report is generated, let’s explore all the sections of the report one by one. Overview This section consists of the 3 tabs: Overview, Warnings, and Reproduction. We use cookies essential for this site to function well. Please click to help us improve its The Overview generated by pandas profiling provides a comprehensive dataset usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. summary, encompassing various key statistics. It covers the fundamental Show details characteristics such as the Number of variables (features or columns of the data frame) and the Number of observations (rows of the data frame). Additionally, it sheds Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
light on data quality by revealing insights into Missing cells and their corresponding percentage, offering a quick assessment of the dataset’s completeness. The Duplicate rows section provides information on the presence of identical rows, including the percentage of duplicate rows. As a holistic touch, the overview concludes with the total memory size, encapsulating the overall footprint of the dataset. Integrating pandas profiling seamlessly facilitates a profound understanding of these essential aspects, enhancing the efficiency of exploratory data analysis. The warnings tab contains any warnings related to cardinality, correlation with other variables, missing values, zeroes, skewness of the variables, and many others. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
The reproduction tab displays information related to the report generation. It shows the start and end times of the analysis, the time taken to generate the report, the software version of pandas profiling, and a configuration download option. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
We will discuss the configuration file in this article’s advanced use case section. Variables This section of the report gives a detailed analysis of all the variables/columns/features of the dataset. The information presented varies depending upon the data type of variable. Let’s break it down. Numeric Variables You get information about the distinct values, missing values, min-max, mean, and negative values count for numeric data type features. You also get small representation values in the form of a Histogram. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
The toggle button expands to the Statistics, Histogram, Common Values, and Extreme Values tabs. The statistics tab includes:
Quantile statistics: Min-Max, percentiles, median, range, and IQR (Inter Quartile range)
Descriptive statistics: Standard Deviation, Coefficient of variance, Kurtosis, mean, skewness, variance, and monotonicity. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
The histogram tab displays the frequency of variables or distribution of numeric data. The common values tab is basically value_counts of the variables presented as both counts and percentage frequency. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
String Variables For string-type variables, you get Distinct (unique) values, distinct percentages, missing missing percentages, memory size, and a horizontal bar presentation of all the unique values with count presentation. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details (It also reports any warnings associated with the variable irrespective of its data type) Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
The toggle button expands to the Overview, Categories, Words, and Characters tab. The Overview tab displays the max-min median mean length, total characters, distinct characters, distinct categories, unique characters, and sample from the dataset for string type values. The categories tab displays a histogram and sometimes a pie chart of the feature’s value counts. The table contains the value, count, and percentage frequency. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
The words and the characters tab does the same job as the categories tab in terms of presenting the data in tabular and histogram format. Still, it can go much deeper into the lower case, upper case, punctuation, special characters categories count too! Correlations Correlation describes the degree to which two variables move in coordination with one another. The pandas profiling python report provides five types of correlation coefficients: Pearson’s r, Spearman’s ρ, Kendall’s τ, Phik (φk), and Cramér’s V (φc). We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
You can also click the toggle button for details about the correlation coefficients. Missing values The report generated also contains visualizations for the missing values in the dataset. You get three types of plots: count, matrix, and dendrogram. The count plot is a basic bar plot with an x-axis as column names, and the length of the bar represents the We use cookies essential for this site to function well. Please click to help us improve its number of vausluefeulsne spsr weisthe anddt it(iownaitl hcoooukite sn. uLella rvna albuouets o)u.r uSsiem ofi lcaorolkyie,s t ihn eou mr Parivtaricxy Paonlicdy &the Cookies Policy. dendrogram are. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Sample This section displays the first and last 10 rows of the dataset. How to Save the Report? So far, you’ve learned how to generate dataframe reports with a single line of code or function and explored the report’s included features. You may want to export this analysis to an external file for integration with other applications or web publishing We use cookies essential for this site to function well. Please click to help us improve its Guess what?us eYfuolnue scsa wnith s aaddviteio ntahl icso orkeieps.o Lreta! rYn oabuo uct aounr ussae vofe c otohkiiess rine opuor rPtr ivinac y– Policy & Cookies Policy. Show details
HTML format
JSON format The save function remains the same for any of the formats. Change the file extension Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
while saving. To save the report, call the “.to_file()” function on the profile object: Copy Code profile.to_file("Analysis.html") profile.to_file("Analysis.json") Advanced Usage The report generated by Pandas profiling Python is a complete analysis without any input from the user except the dataframe object. All the report elements are chosen automatically, and default values are preferred. There might be some elements in the report that you don’t want to include, or you need to add your metadata for the final report. There comes the advanced usage of this library. You can control every aspect of your report by changing the default configurations. Let’s see some of how you can customize your reports. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Add MetaData Cookies Policy. Show details You can add information such as “title”, “description”, “creator”, “author”, “URL”, Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
“copyright_year”, and “copyright_holder”. This information will appear in the dataset overview section. A new tab called “dataset” will be created for this metadata. To add this data to the report, use the dataset parameter in the ProfileReport function and pass this data as a dictionary: Copy Code profile = ProfileReport(df, title="Agriculture Data", dataset={ "description": "This profiling report was generated for Analytics Vidhya Blog", "copyright_holder": "Analytics Vidhya", "copyright_year": "2021", "url": "https://www.analyticsvidhya.com/blog/", },) profile You can also add information about the variables used in the dataset using the variables parameter. This takes in the dictionary with descriptions as the key and value as another dictionary with a key-value pair, where the key is the variable name and the value is the description of the variable. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Copy Code variables={ Cookies Policy. "descriptions": { Show details "State_Name": "Name of the state", "District_Name": "Name of district", "Crop_Year": "Year when it was seeded", "Season": "Crop year", P e"rsCornoapli"ze:d "GWehniAcI hLe acrrnoinpg Pwaatsh 2s0e25e✨de dC?"ra,fted Just for YOU!
"Area": "How much area was allocated to the crop?", "Production": "How much production?", } } When you add this to your ProfileReport function, a separate tab will be created named “Variables” under the overview section: Controlling parameters of the Report Suppose you don’t want to display all types of correlation coefficients. You can disable other coefficients by using the configuration for correlations. This is also a dictionary object and can be passed to the ProfileReport function: We use cookies essential for this site to function well. Please click to help us improve its Copy Code profile = ProfileReport(df, usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & C o o k ie s P o l ic y. title="Agriculture Data", S h o w d e t a i ls correlations={ "pearson": {"calculate": True}, "spearman": {"calculate": False}, "kendall": {"calculate": False}, "phi_k": {"calculate": False}, }) Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Similarly, you can customize every report section, the HTML format, plots, and everything. “ Check out this page of the documentation for details. Integrations After making your reports stunning by configuring every aspect of it, you might want to publish it anyhow. You can export it to HTML format and upload it to the web. But there are some other methods to make your report stand out. Widget in Jupyter notebook While running the panda profiling in your Jupyter notebooks, you will get the HTML rendered in the code cell only. This disturbs the experience of the user. You can make it act like a widget that is easily accessible and offers a compact view. To do this, simply call “.to_widgets()” on your profile object: We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
How to Make it a Part of Streamlit App? Yes! You can make this report as a part of a streamlit app, too. Streamlit is a powerful package that enables GUI web app building with minimal code. The applications are interactive and compatible with almost every device. You can make your reports as a part of the streamlit app by following this code: Step 1: Install the streamlit_pandas_profiling Copy Code pip install streamlit-pandas-profiling Step 2: Create a Python file Create a python file and write code in this format: We use cookies essential for this site to function well. Please click to help us improve its Copy Code import panduasse fualnse spsd with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. import pandSash_opwr odfeitlaiilnsg import streamlit as st from streamlit_pandas_profiling import st_profile_report Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
from pandas_profiling import ProfileReport df = pd.read_csv("crops data.csv", na_values=['=']) profile = ProfileReport(df, title="Agriculture Data", dataset={ "description": "This profiling report was generated for Analytics Vidhya Blog", "copyright_holder": "Analytics Vidhya", "copyright_year": "2021", "url": "https://www.analyticsvidhya.com/blog/", }, variables={ "descriptions": { "State_Name": "Name of the state", "District_Name": "Name of district", "Crop_Year": "Year when it was seeded", "Season": "Crop year", W "eC ursoep c"o:o ki"eWs heisscehn ticarl foopr thwias ss ites etoe fduendct?io"n, well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. "Area": "How much area was allocated to the crop?", Show details "Production": "How much production?", Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
} } ) st.title("Pandas Profiling in Streamlit!") st.write(df) st_profile_report(profile) Step 3: Run your streamlit app In the terminal, type: streamlit run .py We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Exploratory Data Analysis Using Pandas Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Exploratory Data Analysis (EDA) is like exploring a new place. You start by looking around to understand what’s there. Similarly, in EDA, you look at a dataset to see what’s in it and what it can tell you. It’s essentially the initial data exploration stage in data science, where you delve into the dataset statistics and examine its intricacies. Here’s what you do during EDA: Look at the Numbers: You start by checking basic things like averages, ranges, and the spread of the numbers. Make Pictures: Instead of just staring at numbers, you make charts and graphs to show the data visually. It’s like drawing a map of your exploration. Clean Up: Sometimes, data can be messy with missing pieces or weird values. So, you clean it up by filling in missing parts or removing the weird stuff. Create New Ideas: You might develop new ideas or ways to look at the data, like combining different parts or changing how you measure things. Find Connections: You try to see if different parts of the data are related. For example, if one thing goes up, does another also go up? Make Things Simple: If the data is too complicated, you might simplify it to see the big picture more clearly. Look at Time: If your data changes over time, you’ll examine how it changes and whether there are any repeating patterns. We use cookies essential for this site to function well. Please click to help us improve its Test Ideuassef:u lFneisnsa wliltyh, a yddoituio ntael scoto ykioesu. rL eiadrne aabso utto o usr eusee ioff cthooekyie sm ina okure P rsiveacnys Peo liacyn &d if the Cookies Policy. patterns you see are real or just random. Show details Overall, EDA helps you understand your data better before doing any fancy analysis or Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
making big conclusions. It’s like exploring a map before going on a big adventure! Conclusion In this article, you are introduced to a new tool, “Pandas Profiling,” a one-stop solution for generating reports out of the Pandas dataframe. We explore all the features of this tool, different sections, and their content. Then, we move on to saving the report generated. Later, we look at some of the advanced use cases of this library and finally integrate the Streamlit app to make the reports more promising and interactive. The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion. Kaustubh Gupta Kaustubh Gupta is a skilled engineer with a B.Tech in Information Technology from Maharaja Agrasen Institute of Technology. With experience as a CS Analyst and Analyst Intern at Prodigal Technologies, Kaustubh excels in Python, SQL, Libraries, and various engineering tools. He has developed core components of product intent engines, created gold tables in Databricks, and built internal tools and dashboards using Streamlit and Tableau. Recognized as India’s Top 5 Community Contributor 2023 by Analytics Vidhya, Kaustubh is also a prolific writer and mentor, contributing significantly to the tech community through speaking sessions and workshops. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Data Engineering Data Exploration Data Visualization Intermediate Listicle Machine Learning Project Python Python Streamlit Structured Data Technique Free Courses 4.7 Generative AI - A Way of Life Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics. 4.5 Getting Started with Large Language Models Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple. We use cookies e s4s.e6ntial for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Building LLM Applications using Prompt Engineering Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data. 4.8 Improving Real World RAG Systems: Key Challenges & Practical Solutions Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications. 4.7 Microsoft Excel: Formulas & Functions Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Responses From Readers What are your thoughts?... Submit reply Santosh Kesava Hi , This is really a informative post thank you for posting. My scenario is same but the only missing part is how to We uscer ceoaotkeie sa e tsasebn twiailt fho rc thoims spitlee ttoe f udnacttiaon a wnedll. bPuletatsoen c lticok dtoo hwenlpl ousa idm pinrove its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & excel or csv ? Please help. Thank you Santosh Kesava Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Niladri Chakraborty Nice article. I'm trying to run pandas profiling in my python 3.8.x version. But I'm getting error message saying PydanticImportError: BaseSettingshas been moved to thepydantic-settings package while I'm running from pandas_profiling import ProfileReport. Can you please guide me to resolve the issue? We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Frequently Asked Questions How to use pandas-profiling? A. To use pandas-profiling, you should first install it using pip. Then, import it into your Python script or Jupyter Notebook. Load your dataset with Pandas, create a ProfileReport object, and call its to_file() or to_widgets() methods to obtain a detailed analysis and visualization of your data. What is Pandas profiling? Why use pandas profiling? Q4. How to pip install pandas-profiling? Write for us Write, captivate, and earn accolades and rewards for your work We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Reach Cao Goklioesb Paol lAicyu.dience Get ExpSehrot wF edeedtabialsck Build Your Brand & Audience Cash In on Your Knowledge Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Join a Thriving Community Level Up Your Data Science Game Flagship Courses GenAI Pinnacle Program | AI/ML BlackBelt Courses Free Courses Generative AI | Large Language Models | Building LLM Applications using Prompt Engineering | Building Your first RAG System using LlamaIndex | Stability.AI | MidJourney | Building Production Ready RAG systems using LlamaIndex | Building LLMs for Code | Deep Learning | Python | Microsoft ExcWele| u Msea ccohokiniees eLseseanrtniailn fogr |th Dis esicteis tioo fnun Tctrioene wse|ll .P Palenadsea csl icfok rto D haeltpa u As inmaplryosveis its| Ensemble usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Learning | NLP | NLP using Deep Learning | Neural Networks | Loan Prediction Practice Cookies Policy. Problem | Time Series Forecasting | Tableau | Business Analytics Show details Popular Categories Generative AI | Prompt Engineering | Generative AI Application | News | Technical Guides | AI Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
Tools | Interview Preparation | Research Papers | Success Stories | Quiz | Use Cases | Listicles Generative AI Tools and Techniques GANs | VAEs | Transformers | StyleGAN | Pix2Pix | Autoencoders | GPT | BERT | Word2Vec | LSTM | Attention Mechanisms | Diffusion Models | LLMs | SLMs | StyleGAN | Encoder Decoder Models | Prompt Engineering | LangChain | LlamaIndex | RAG | Fine-tuning | LangChain AI Agent | Multimodal Models | RNNs | DCGAN | ProGAN | Text-to-Image Models | DDPM | Document Question Answering | Imagen | T5 (Text-to-Text Transfer Transformer) | Seq2seq Models | WaveNet | Attention Is All You Need (Transformer Architecture) Popular GenAI Models Llama 3.1 | Llama 3 | Llama 2 | GPT 4o Mini | GPT 4o | GPT 3 | Claude 3 Haiku | Claude 3.5 Sonnet | Phi 3.5 | Phi 3 | Mistral Large 2 | Mistral NeMo | Mistral-7b | Gemini 1.5 Pro | Gemini Flash 1.5 | Bedrock | Vertex AI | DALL.E | Midjourney | Stable Diffusion Data Science Tools and Techniques Python | R | SQL | Jupyter Notebooks | TensorFlow | Scikit-learn | PyTorch | Tableau | Apache Spark | Matplotlib | Seaborn | Pandas | Hadoop | Docker | Git | Keras | Apache Kafka | AWS | NLP | Random Forest | Computer Vision | Data Visualization | Data Exploration | Big Data | Common Machine Learning Algorithms | Machine Learning Company Discover About Us Blogs Contact Us Expert Sessions We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Careers Learning Paths Cookies Policy. Show details Comprehensive Guides Learn Engage Free Courses Community Personalized GenAI Learning Path 2025✨ Crafted Just for YOU!
AI&ML Program Hackathons GenAI Program Events Agentic AI Program Podcasts Contribute Enterprise Become an Author Our Offerings Become a Speaker Trainings Become a Mentor Data Culture Become an Instructor AI Newsletter Terms & conditions Refund Policy Privacy Policy Cookies Policy © Analytics Vidhya 2025.All rights reserved. We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy. Show details