Skip to content

NumPy

Explore pandas functions that can also use with numpy

  • Pandas for data manipulation and analysis or
  • NumPy for numerical computing

Compare Pandas and Numpy

1. Shift

The pandas.Series.shift() function shifts the values of a particular position in a Series up or down by a specified number of periods.

Pandas: Shifts the values of a series by a certain number of periods. NumPy Equivalent: You can use np.roll() to achieve a similar effect.

Example

# Pandas
df['shifted'] = df['column'].shift(1)

#  NumPy
df['shifted'] = np.roll(df['column'].values, 1)

2. Diff

pandas.Series.diff() is used to calculate the difference between consecutive elements in a given series.

Interestingly, you can achieve the same result with NumPy using np.diff().

Example

# Pandas
df['difference'] = df['column'].diff()
# NumPy
df['difference'] = np.diff(df['column'], prepend=np.nan)

Output

output
   column  difference  difference_np
0      30         NaN             NaN
1      32         2.0             2.0
2      35         3.0             3.0
3      31        -4.0            -4.0
4      29        -2.0            -2.0

3. Apply

pandas.DataFrame.apply() you can add two rows and form a new one as an answer.

Pandas: Apply a function to each row or column. NumPy Equivalent: Use np.apply_along_axis() for similar functionality on NumPy arrays.

Example

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'a': [1, 2, 3], 'b': [4, 5, 6]}
df = pd.DataFrame(data)
# Using Pandas
df['result'] = df.apply(lambda row: row['a'] + row['b'], axis=1)
# Using NumPy
result = np.apply_along_axis(lambda row: row[0] + row[1], 1, df[['a', 'b']].valu
print(df)
print("NumPy Result:", result)

Output

   a  b  result
0  1  4       5
1  2  5       7
2  3  6       9
NumPy Result: [5 7 9]

4. Rank

pandas.Series.rank()

Pandas: Computes the rank of each element in a Series. NumPy Equivalent: You can achieve similar functionality using np.argsort().

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'column': [50, 20, 80, 60]}
df = pd.DataFrame(data)
# Using Pandas
df['rank'] = df['column'].rank()
# Using NumPy
ranks = np.argsort(np.argsort(df['column'].values))
df['numpy_rank'] = ranks + 1  # Adding 1 because ranks start from 1 in Pandas
print(df)

Output

   column  rank  numpy_rank
0      50   2.0           2
1      20   1.0           1
2      80   4.0           4
3      60   3.0           3

5. IsIn

pandas.Series.isin() Pandas: Check if elements in a Series are in a given list or array. NumPy Equivalent: Use np.in1d() to perform this task.

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {
    'product': ['TV', 'Sofa', 'Laptop', 'Table', 'Shirt', 'Headphones', 'Shoes']
    'category': ['Electronics', 'Furniture', 'Electronics', 'Furniture', 'Clothi
}
df = pd.DataFrame(data)
# Categories to check
target_categories = ['Electronics', 'Furniture', 'Clothing']
# Using Pandas
df['is_in'] = df['category'].isin(target_categories)
# Using NumPy
df['is_in_np'] = np.in1d(df['category'], target_categories)
# Display the DataFrame
print(df)

Output:

      product    category  is_in  is_in_np
0          TV  Electronics   True      True
1        Sofa   Furniture    True      True
2      Laptop  Electronics   True      True
3       Table   Furniture    True      True
4       Shirt   Clothing     True      True
5  Headphones  Electronics   True      True
6       Shoes     Apparel   False     False

6. CumSum

Pandas: Calculates the cumulative sum of the values in a series. NumPy Equivalent: Use np.cumsum() for cumulative summation.

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'column': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# Using Pandas to calculate cumulative sum
df['cumsum_pandas'] = df['column'].cumsum()
# Using NumPy to calculate cumulative sum
df['cumsum_numpy'] = np.cumsum(df['column'].values)
print(df)

Output:

   column  cumsum_pandas  cumsum_numpy
0       1              1             1
1       2              3             3
2       3              6             6
3       4             10            10
4       5             15            15

7. Expanding

Now, we can achieve this using the pandas.Series.expanding() function, which expands a window over the data to compute cumulative statistics like the running mean.

NumPy Equivalent: You can achieve this manually with np.cumsum() and computing the desired statistic over expanding windows.

Example:

import pandas as pd
import numpy as np
# Sample DataFrame
data = {'column': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# Using Pandas to calculate the expanding mean
df['expanding_mean'] = df['column'].expanding().mean()
# Using NumPy to calculate the manual expanding mean
expanding_mean = np.cumsum(df['column'].values) / np.arange(1, len(df) + 1)
# Adding the NumPy result to the DataFrame
df['expanding_mean_numpy'] = expanding_mean
# Display the DataFrame
print(df)

Output:

   column  expanding_mean  expanding_mean_numpy
0       1             1.0                  1.0
1       2             1.5                  1.5
2       3             2.0                  2.0
3       4             2.5                  2.5
4       5             3.0                  3.0

8. Pct_Change

Pandas: Computes the percentage change between the current and prior element. NumPy Equivalent: Use a combination of NumPy array operations to calculate the percentage change.

Example:

import pandas as pd
import numpy as np
# Sample sales data
data = {'day': ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5'],
        'sales': [100, 110, 150, 120, 130]}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to compute percentage change
df['pct_change_pandas'] = df['sales'].pct_change()
# Using NumPy to compute percentage change manually
pct_change_numpy = np.diff(df['sales'].values) / df['sales'].values[:-1]
pct_change_numpy = np.insert(pct_change_numpy, 0, np.nan)  # Insert NaN at the s
# Add the NumPy result to the DataFrame
df['pct_change_numpy'] = pct_change_numpy
# Display the DataFrame
print(df)

Output

     day  sales  pct_change_pandas  pct_change_numpy
0  Day 1    100                NaN               NaN
1  Day 2    110           0.100000          0.100000
2  Day 3    150           0.363636          0.363636
3  Day 4    120          -0.200000         -0.200000
4  Day 5    130           0.083333          0.083333

9. Fill NA

Pandas: Fills NaN values with a specified value or method. NumPy Equivalent: Use np.where() to replace NaN values in a NumPy array.

Example:

import pandas as pd
import numpy as np
# Sample data with NaN values
data = {'scores': [100, 200, np.nan, 300, np.nan]}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to fill NaN values
df['filled_pandas'] = df['scores'].fillna(0)
# Using NumPy to fill NaN values manually
df['filled_numpy'] = np.where(np.isnan(df['scores'].values), 0, df['scores'].val
# Display the DataFrame
print(df)


Output:
~~~plain
   scores  filled_pandas  filled_numpy
0   100.0          100.0         100.0
1   200.0          200.0         200.0
2     NaN            0.0           0.0
3   300.0          300.0         300.0
4     NaN            0.0           0.0

10. Drop NA

The pandas.DataFrame.dropna() function removes rows (or columns) from a

Pandas: Drops rows or columns with NaN values. NumPy Equivalent: Use ~np.isnan() to filter out rows containing NaN.

Example:

import pandas as pd
import numpy as np
# Sample data with NaN values
data = {'scores': [100, 200, np.nan, 300, 150]}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to drop rows with NaN values
df_cleaned_pandas = df.dropna()
# Using NumPy to manually filter out rows with NaN values
df_cleaned_numpy = df[~np.isnan(df['scores'].values)]
# Display the results
print("Original DataFrame:\n", df)
print("\nPandas dropna():\n", df_cleaned_pandas)
print("\nNumPy equivalent:\n", df_cleaned_numpy)

Output

Original DataFrame:
    scores
0   100.0
1   200.0
2     NaN
3   300.0
4   150.0
Pandas dropna():
    scores
0   100.0
1   200.0
3   300.0
4   150.0
NumPy equivalent:
    scores
0   100.0
1   200.0
3   300.0
4   150.0

11. Value Counts

Pandas: Counts the unique values in a series. NumPy Equivalent: Use np.unique() with return_counts=True to get unique values and their counts. Example: Responses: ['A', 'B', 'A', 'C', 'B', 'A', 'B']

import pandas as pd
import numpy as np
# Sample data
data = {'responses': ['A', 'B', 'A', 'C', 'B', 'A', 'B']}
# Create DataFrame
df = pd.DataFrame(data)
# Using Pandas to count unique values
value_counts_pandas = df['responses'].value_counts()
# Using NumPy to count unique values manually
unique, counts = np.unique(df['responses'].values, return_counts=True)
value_counts_numpy = dict(zip(unique, counts))  # Combine unique values and coun
# Display the results
print("Pandas value_counts():\n", value_counts_pandas)
print("\nNumPy equivalent:\n", value_counts_numpy)

Output:

Pandas value_counts():
 A    3
 B    3
 C    1
Name: responses, dtype: int64
NumPy equivalent:
 {'A': 3, 'B': 3, 'C': 1}