NumPy

Explore pandas functions that can also use with numpy

Pandas for data manipulation and analysis or
NumPy for numerical computing

Compare Pandas and Numpy

1. Shift

The pandas.Series.shift() function shifts the values of a particular position in a Series up or down by a specified number of periods.

Pandas: Shifts the values of a series by a certain number of periods. NumPy Equivalent: You can use np.roll() to achieve a similar effect.

Example

# Pandas
df['shifted'] = df['column'].shift(1)

#  NumPy
df['shifted'] = np.roll(df['column'].values, 1)

2. Diff

pandas.Series.diff() is used to calculate the difference between consecutive elements in a given series.

Interestingly, you can achieve the same result with NumPy using np.diff().

Example

# Pandas
df['difference'] = df['column'].diff()
# NumPy
df['difference'] = np.diff(df['column'], prepend=np.nan)

Output

output
   column  difference  difference_np
0      30         NaN             NaN
1      32         2.0             2.0
2      35         3.0             3.0
3      31        -4.0            -4.0
4      29        -2.0            -2.0

3. Apply

pandas.DataFrame.apply() you can add two rows and form a new one as an answer.

Pandas: Apply a function to each row or column. NumPy Equivalent: Use np.apply_along_axis() for similar functionality on NumPy arrays.

Example

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'a': [1, 2, 3], 'b': [4, 5, 6]}
df = pd.DataFrame(data)
# Using Pandas
df['result'] = df.apply(lambda row: row['a'] + row['b'], axis=1)
# Using NumPy
result = np.apply_along_axis(lambda row: row[0] + row[1], 1, df[['a', 'b']].valu
print(df)
print("NumPy Result:", result)

Output

   a  b  result
0  1  4       5
1  2  5       7
2  3  6       9
NumPy Result: [5 7 9]

4. Rank

pandas.Series.rank()

Pandas: Computes the rank of each element in a Series. NumPy Equivalent: You can achieve similar functionality using np.argsort().

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'column': [50, 20, 80, 60]}
df = pd.DataFrame(data)
# Using Pandas
df['rank'] = df['column'].rank()
# Using NumPy
ranks = np.argsort(np.argsort(df['column'].values))
df['numpy_rank'] = ranks + 1  # Adding 1 because ranks start from 1 in Pandas
print(df)

Output

   column  rank  numpy_rank
0      50   2.0           2
1      20   1.0           1
2      80   4.0           4
3      60   3.0           3

5. IsIn

pandas.Series.isin() Pandas: Check if elements in a Series are in a given list or array. NumPy Equivalent: Use np.in1d() to perform this task.

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {
    'product': ['TV', 'Sofa', 'Laptop', 'Table', 'Shirt', 'Headphones', 'Shoes']
    'category': ['Electronics', 'Furniture', 'Electronics', 'Furniture', 'Clothi
}
df = pd.DataFrame(data)
# Categories to check
target_categories = ['Electronics', 'Furniture', 'Clothing']
# Using Pandas
df['is_in'] = df['category'].isin(target_categories)
# Using NumPy
df['is_in_np'] = np.in1d(df['category'], target_categories)
# Display the DataFrame
print(df)

Output:

      product    category  is_in  is_in_np
0          TV  Electronics   True      True
1        Sofa   Furniture    True      True
2      Laptop  Electronics   True      True
3       Table   Furniture    True      True
4       Shirt   Clothing     True      True
5  Headphones  Electronics   True      True
6       Shoes     Apparel   False     False

6. CumSum

Pandas: Calculates the cumulative sum of the values in a series. NumPy Equivalent: Use np.cumsum() for cumulative summation.

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'column': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# Using Pandas to calculate cumulative sum
df['cumsum_pandas'] = df['column'].cumsum()
# Using NumPy to calculate cumulative sum
df['cumsum_numpy'] = np.cumsum(df['column'].values)
print(df)

Output:

   column  cumsum_pandas  cumsum_numpy
0       1              1             1
1       2              3             3
2       3              6             6
3       4             10            10
4       5             15            15

7. Expanding

Now, we can achieve this using the pandas.Series.expanding() function, which expands a window over the data to compute cumulative statistics like the running mean.

NumPy Equivalent: You can achieve this manually with np.cumsum() and computing the desired statistic over expanding windows.

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'column': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# Using Pandas to calculate the expanding mean
df['expanding_mean'] = df['column'].expanding().mean()
# Using NumPy to calculate the manual expanding mean
expanding_mean = np.cumsum(df['column'].values) / np.arange(1, len(df) + 1)
# Adding the NumPy result to the DataFrame
df['expanding_mean_numpy'] = expanding_mean
# Display the DataFrame
print(df)

Output:

   column  expanding_mean  expanding_mean_numpy
0       1             1.0                  1.0
1       2             1.5                  1.5
2       3             2.0                  2.0
3       4             2.5                  2.5
4       5             3.0                  3.0

8. Pct_Change

Pandas: Computes the percentage change between the current and prior element. NumPy Equivalent: Use a combination of NumPy array operations to calculate the percentage change.

Example:

import pandas as pd
import numpy as np
# Sample sales data
data = {'day': ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5'],
        'sales': [100, 110, 150, 120, 130]}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to compute percentage change
df['pct_change_pandas'] = df['sales'].pct_change()
# Using NumPy to compute percentage change manually
pct_change_numpy = np.diff(df['sales'].values) / df['sales'].values[:-1]
pct_change_numpy = np.insert(pct_change_numpy, 0, np.nan)  # Insert NaN at the s
# Add the NumPy result to the DataFrame
df['pct_change_numpy'] = pct_change_numpy
# Display the DataFrame
print(df)

Output

     day  sales  pct_change_pandas  pct_change_numpy
0  Day 1    100                NaN               NaN
1  Day 2    110           0.100000          0.100000
2  Day 3    150           0.363636          0.363636
3  Day 4    120          -0.200000         -0.200000
4  Day 5    130           0.083333          0.083333

9. Fill NA

Pandas: Fills NaN values with a specified value or method. NumPy Equivalent: Use np.where() to replace NaN values in a NumPy array.

Example:

import pandas as pd
import numpy as np
# Sample data with NaN values
data = {'scores': [100, 200, np.nan, 300, np.nan]}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to fill NaN values
df['filled_pandas'] = df['scores'].fillna(0)
# Using NumPy to fill NaN values manually
df['filled_numpy'] = np.where(np.isnan(df['scores'].values), 0, df['scores'].val
# Display the DataFrame
print(df)


Output:
~~~plain
   scores  filled_pandas  filled_numpy
0   100.0          100.0         100.0
1   200.0          200.0         200.0
2     NaN            0.0           0.0
3   300.0          300.0         300.0
4     NaN            0.0           0.0

10. Drop NA

The pandas.DataFrame.dropna() function removes rows (or columns) from a

Pandas: Drops rows or columns with NaN values. NumPy Equivalent: Use ~np.isnan() to filter out rows containing NaN.

Example:

import pandas as pd
import numpy as np
# Sample data with NaN values
data = {'scores': [100, 200, np.nan, 300, 150]}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to drop rows with NaN values
df_cleaned_pandas = df.dropna()
# Using NumPy to manually filter out rows with NaN values
df_cleaned_numpy = df[~np.isnan(df['scores'].values)]
# Display the results
print("Original DataFrame:\n", df)
print("\nPandas dropna():\n", df_cleaned_pandas)
print("\nNumPy equivalent:\n", df_cleaned_numpy)

Output

Original DataFrame:
    scores
0   100.0
1   200.0
2     NaN
3   300.0
4   150.0
Pandas dropna():
    scores
0   100.0
1   200.0
3   300.0
4   150.0
NumPy equivalent:
    scores
0   100.0
1   200.0
3   300.0
4   150.0

11. Value Counts

Pandas: Counts the unique values in a series. NumPy Equivalent: Use np.unique() with return_counts=True to get unique values and their counts. Example: Responses: ['A', 'B', 'A', 'C', 'B', 'A', 'B']

import pandas as pd
import numpy as np
# Sample data
data = {'responses': ['A', 'B', 'A', 'C', 'B', 'A', 'B']}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to count unique values
value_counts_pandas = df['responses'].value_counts()
# Using NumPy to count unique values manually
unique, counts = np.unique(df['responses'].values, return_counts=True)
value_counts_numpy = dict(zip(unique, counts))  # Combine unique values and coun
# Display the results
print("Pandas value_counts():\n", value_counts_pandas)
print("\nNumPy equivalent:\n", value_counts_numpy)

Output:

Pandas value_counts():
 A    3
 B    3
 C    1
Name: responses, dtype: int64
NumPy equivalent:
 {'A': 3, 'B': 3, 'C': 1}