Pandas Code Snippets: A Simple Guide to Python Data Analysis

Intro – Pandas Code Snippets

Whether you are a seasoned data scientist looking to brush up your skills or a budding programmer, revisiting the basics and experimenting with snippets of code can pave the way for more complex data handling projects. This article unfolds a series of hands-on Pandas code snippets to rekindle your understanding and application of this indispensable library.

Importing Pandas:

No output is produced in this step. This line imports the Pandas library and aliases it as pd.

import pandas as pd

Creating a DataFrame in Pandas

Creates a DataFrame from a dictionary. The keys of the dictionary become the column names, and the values become the data in the columns.

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
}

df = pd.DataFrame(data)

print(df)

Output

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Reading Data from a CSV File in Pandas:

Reads data from a CSV file and creates a DataFrame from it.

# Assume the file 'data.csv' contains:
# Name,Age
# Alice,25
# Bob,30

df = pd.read_csv('data.csv')

print(df)

Output

    Name  Age
0  Alice   25
1    Bob   30

Selecting a Column in Pandas data frame

This snippet selects the ‘Name’ column from the DataFrame.

# Assume the file 'data.csv' contains:
# Name,Age
# Alice,25
# Bob,30

df = pd.read_csv('data.csv')

selected_column = df['Name']

print(selected_column)

Output

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

Filtering Rows in Pandas by column name

filters the rows to only include those where the age is greater than 25.

# Using DataFrame from snippet 2

filtered_rows = df[df['Age'] > 25]

print(filtered_rows)

Output

      Name  Age
1      Bob   30
2  Charlie   35

Adding a New Column in Pandas Dataframe

adds a new column to the DataFrame, indicating whether each individual is an adult based on their age.

# Using DataFrame from snippet 2

df['Is Adult'] = df['Age'] >= 18

print(df)

Output

      Name  Age  Is Adult
0    Alice   25      True
1      Bob   30      True
2  Charlie   35      True

Dropping a Column in Pandas

# Using DataFrame from snippet 6

df_dropped = df.drop(columns=['Is Adult'])

print(df_dropped)

Output

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Setting Index in Pandas

Sets the ‘Name’ column as the index of the DataFrame.

# Using DataFrame from snippet 2

df_indexed = df.set_index('Name')

print(df_indexed)

Output

         Age
Name        
Alice     25
Bob       30
Charlie   35

Resetting Index in Pandas

Resets the index of the DataFrame to the default integer index.

# Using DataFrame from snippet 8

df_reset = df_indexed.reset_index()

print(df_reset)

Output

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Grouping Data in Pandas

groups the data by the ‘Department’ column and calculates the mean salary for each department.

data = {
    'Department': ['HR', 'IT', 'HR', 'IT'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David'],
    'Salary': [50000, 60000, 55000, 62000]
}

df = pd.DataFrame(data)

grouped = df.groupby('Department').mean()

print(grouped)

Output

            Salary
Department        
HR           52500
IT           61000

Merging DataFrames in Python

merges two DataFrames on the ‘Key’ column, keeping only the rows that have a key present in both DataFrames (inner join).

data1 = {
    'Key': ['A', 'B', 'C'],
    'Value1': [1, 2, 3]
}

data2 = {
    'Key': ['B', 'C', 'D'],
    'Value2': [4, 5, 6]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

merged_df = pd.merge(df1, df2, on='Key', how='inner')

print(merged_df)

Output

  Key  Value1  Value2
0   B       2       4
1   C       3       5

Sorting Data in Python

Sorts the DataFrame by the ‘Age’ column in descending order.

# Using DataFrame from snippet 2

sorted_df = df.sort_values(by='Age', ascending=False)

print(sorted_df)

Output

      Name  Age
2  Charlie   35
1      Bob   30
0    Alice   25

Renaming Columns in Pandas

Renames the columns of the DataFrame.

# Assuming df from previous examples
df_renamed = df.rename(columns={'Name': 'Employee Name', 'Age': 'Employee Age'})
print(df_renamed)

Output

  Employee Name  Employee Age
0        Alice           25
1          Bob           30
2      Charlie           35

Handling Missing Data in Pandas

fills missing values with the mean of the non-null values in their respective columns.

import numpy as np

data = {
    'A': [1, 2, np.nan],
    'B': [4, np.nan, np.nan],
    'C': [7, 8, 9]
}

df = pd.DataFrame(data)

df_filled = df.fillna(value=df.mean())

print(df_filled)

Output

     A    B  C
0  1.0  4.0  7
1  2.0  5.0  8
2  3.0  5.0  9

Applying Functions in Pandas

applies a function to each element in the ‘Age’ column.

# Assuming df from previous examples

def age_group(age):
    return 'Adult' if age >= 18 else 'Minor'

df['Age Group'] = df['Age'].apply(age_group)

print(df)

Output

      Name  Age Age Group
0    Alice   25     Adult
1      Bob   30     Adult
2  Charlie   35     Adult

Descriptive Statistics in Pandas

provides descriptive statistics for the numerical columns in the DataFrame.

# Assuming df from previous examples

print(df.describe())

Output

             Age
count   3.000000
mean   30.000000
std     5.000000
min    25.000000
25%    27.500000
50%    30.000000
75%    32.500000
max    35.000000

Unique Values in Pandas

finds the unique values in the ‘Name’ column.

# Assuming df from previous examples

unique_names = df['Name'].unique()

print(unique_names)

Output

['Alice' 'Bob' 'Charlie']

Value Counts in Pandas

counts the occurrences of each unique value in the ‘Fruit’ column.

data = {
    'Fruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Banana']
}

df = pd.DataFrame(data)

fruit_counts = df['Fruit'].value_counts()

print(fruit_counts)

Output

Banana    3
Apple     2
Name: Fruit, dtype: int64

Converting DataFrame to Numpy Array in Pandas

converts the DataFrame to a Numpy array.

# Assuming df from previous examples

array = df.values

print(array)

Output

[['Alice' 25]
 ['Bob' 30]
 ['Charlie' 35]]

Concatenating DataFrames in Pandas

Concatenates two DataFrames along the row axis.

data1 = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}

data2 = {
    'A': [7, 8, 9],
    'B': [10, 11, 12]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

concatenated_df = pd.concat([df1, df2])

print(concatenated_df)

Ouput

DataFrame Slicing in Pandas

slices the DataFrame to only include rows 1 and 2 (0-indexed).

# Assuming df from previous examples
sliced_df = df.iloc[1:3]
print(sliced_df)

Ouput

      Name  Age
1      Bob   30
2  Charlie   35

Changing Data Types in Pandas

Changes the data type of the ‘Age’ column to float.

# Assuming df from previous examples
df['Age'] = df['Age'].astype(float)
print(df.dtypes)

Output:

Name     object
Age     float64
dtype: object

Getting Column Names in Pandas

Retrieves the column names of the DataFrame.

# Assuming df from previous examples
columns = df.columns
print(columns)

Output

Index(['Name', 'Age'], dtype='object')

Checking for Missing Data in Pandas

import numpy as np

data = {
    'A': [1, 2, np.nan],
    'B': [4, np.nan, np.nan],
    'C': [7, 8, 9]
}

df = pd.DataFrame(data)

missing_data = df.isnull()
print(missing_data)

Output

       A      B      C
0  False  False  False
1  False   True  False
2   True   True  False

Dropping Rows with Missing Data in Pandas:

# Using DataFrame from snippet 24

df_no_missing = df.dropna()
print(df_no_missing)

Output

     A    B  C
0  1.0  4.0  7

Finding the Index of Maximum and Minimum Values in Pandas

finds the index of the maximum and minimum values in the ‘Age’ column.

# Assuming df from previous examples
max_age_index = df['Age'].idxmax()
min_age_index = df['Age'].idxmin()
print(f'Max Age Index: {max_age_index}, Min Age Index: {min_age_index}')

Output

Max Age Index: 2, Min Age Index: 0

Saving DataFrame to CSV in pandas

This will save the DataFrame to a CSV file named ‘output.csv’. There won’t be any output displayed in the console.

# Assuming df from previous examples
df.to_csv('output.csv', index=False)

Creating a DataFrame from a Series in Pandas

A Pandas Series is being created with the data ['Alice', 'Bob', 'Charlie'].
The name parameter is used to give a name to the Series, which in this case is 'Name'.

The to_frame() method is being called on the Series object to convert it into a DataFrame.
The name of the Series becomes the column name in the DataFrame.

series = pd.Series(['Alice', 'Bob', 'Charlie'], name='Name')
df_series = series.to_frame()
print(df_series)

Output

      Name
0    Alice
1      Bob
2  Charlie

Pandas Code Snippets: A Simple Guide to Python Data Analysis

Intro – Pandas Code Snippets

Leave a Reply Cancel reply

Most Popular

Advanced Routing Techniques in Next js 15

Building an AI-Powered Image-to-Text Converter with Claude, Next.js 15, and Vercel AI SDK

How to Generate Dynamic OpenGraph Images in Next.js App Router 15 with TypeScript

How to Install Google Analytics 4 in Next.js 15 (App Router) with TypeScript [2024]

Getting Started with Docker Compose

About

Resource

Get the Top 10 in Search!

Intro – Pandas Code Snippets

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Subscribe Now

Most Popular

Always Stay Up to Date

About

Resource

Get the Top 10 in Search!