Cheatsheet

Selecting and Filtering Data

Selecting and filtering data are fundamental operations when working with Pandas DataFrames. Pandas provides multiple methods for selecting specific rows and columns based on labels, positions, and conditions.

1. Selecting Data

1.1 Selecting Columns

Using column names:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 40],
    "City": ["New York", "Los Angeles", "Chicago", "Houston"]
})

# Select a single column
print(df["Name"])

# Select multiple columns
print(df[["Name", "City"]])

1.2 Selecting Rows

Using .loc[] (label-based indexing):

# Select a row by index label
print(df.loc[0])  # First row

Using .iloc[] (position-based indexing):

# Select a row by position
print(df.iloc[2])  # Third row

Selecting multiple rows:

print(df.loc[0:2])  # Select rows 0 to 2
print(df.iloc[0:2])  # Select first two rows

2. Filtering Data

Filtering is used to extract subsets of a DataFrame that meet specific conditions.

2.1 Filtering Rows Based on Condition

# Select rows where Age > 30
print(df[df["Age"] > 30])

2.2 Filtering with Multiple Conditions

Using & (AND condition):

# Select rows where Age > 30 and City is "Chicago"
print(df[(df["Age"] > 30) & (df["City"] == "Chicago")])

Using | (OR condition):

# Select rows where Age < 30 or City is "Houston"
print(df[(df["Age"] < 30) | (df["City"] == "Houston")])

Using .isin() for filtering specific values:

# Select rows where City is in ["New York", "Houston"]
print(df[df["City"].isin(["New York", "Houston"])])

Using .between() for range-based filtering:

# Select rows where Age is between 25 and 35
print(df[df["Age"].between(25, 35)])

Using .query() for SQL-like filtering:

print(df.query("Age > 30 and City == 'Chicago'"))

3. Selecting Specific Rows and Columns

# Select specific rows and columns
print(df.loc[df["Age"] > 30, ["Name", "City"]])

# Select first two rows and first two columns
print(df.iloc[0:2, 0:2])

Mini Tutorial: Filtering a Real Dataset

Let's assume we have a dataset data.csv with employee information:

Step 1: Load the Dataset

df = pd.read_csv("data.csv")

Step 2: Explore the Data

print(df.head())  # View the first few rows
print(df.info())  # Get summary information

Step 3: Apply Filtering

# Employees older than 40
older_employees = df[df["Age"] > 40]

# Employees in the IT department
it_employees = df[df["Department"] == "IT"]

# Employees with salary between 50K and 100K
mid_salary_employees = df[df["Salary"].between(50000, 100000)]

# IT employees in New York
ny_it_employees = df[(df["Department"] == "IT") & (df["City"] == "New York")]

Step 4: Save the Filtered Data

ny_it_employees.to_csv("filtered_data.csv", index=False)

Conclusion

Use .loc[] and .iloc[] for row and column selection.
Use conditional filtering with boolean operators (&, |).
Use .query() for SQL-like filtering.
Use .isin() and .between() for specialized filtering.

Would you like an example using PandasGUI for interactive filtering? 🚀