Pandas python Library Overview



Welcome to Part 14 of our Data Science Blog series! In this post, we will explore the powerful Pandas library in Python, which is a popular tool for data manipulation and analysis. Pandas provides data structures and functions that make working with structured data (such as CSV files, Excel sheets, SQL databases, etc.) much easier and more efficient.

Let's dive into some essential aspects of the Pandas library with code examples:

Before we begin, ensure that you have Pandas installed. If not, you can install it using pip:

pip install pandas

To use Pandas in your Python code, you need to import it:

import pandas as pd

Pandas provides various methods to read data from different file formats. For this example, we will read data from a CSV file:

# Assuming you have a file named "data.csv" in the current directory df = pd.read_csv("data.csv")


Let's start by examining the basic structure of the DataFrame and some summary statistics:

# Display the first few rows of the DataFrame print(df.head()) # Get information about the DataFrame print(df.info()) # Get summary statistics of the numerical columns print(df.describe())

Pandas allows you to select specific rows and columns from the DataFrame:

# Select a single column column_name = "Age" age_column = df[column_name] # Select multiple columns selected_columns = df[["Name", "Age", "Gender"]] # Select rows based on condition young_people = df[df["Age"] < 30] # Select rows based on multiple conditions female_seniors = df[(df["Gender"] == "Female") & (df["Age"] > 65)]

Pandas makes it easy to modify data in the DataFrame:

# Adding a new column df["AgeGroup"] = pd.cut(df["Age"], bins=[0, 18, 30, 50, 100], labels=["Child", "Young", "Adult", "Senior"]) # Updating values in a column based on conditions df.loc[df["Age"] < 18, "AgeGroup"] = "Minor"

# Group data by a column and calculate mean grouped_data = df.groupby("Gender")["Age"].mean() # Group data by multiple columns and calculate multiple statistics grouped_data = df.groupby(["Gender", "AgeGroup"]).agg({"Age": "mean", "Income": "sum"})

Pandas provides functions to handle missing data:

# Check for missing values print(df.isnull().sum()) # Drop rows with any missing values df_cleaned = df.dropna() # Fill missing values with a specific value df_filled = df.fillna(0)


Data Visualization


import matplotlib.pyplot as plt # Create a bar plot of AgeGroup counts df["AgeGroup"].value_counts().plot(kind="bar") plt.xlabel("Age Group") plt.ylabel("Count") plt.title("Age Group Distribution") plt.show()






Popular posts from this blog

Official QR Scanner Privacy Policy

Numpy python Library

All in one Video downloader Privacy Policy