Compute

Computing new values from existing data is one of the most common tasks in data analysis. Whether you’re calculating rates, percentages, or creating categorical variables, pandas provides powerful tools for creating new columns and transforming your data.

Setup data

import pandas as pd

# Load and prepare the data
accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv")
accident_list["latimes_make_and_model"] = accident_list["latimes_make_and_model"].str.upper()

print(f"Loaded {len(accident_list)} accidents")
accident_list.head()
Loaded 163 accidents
event_id ntsb_make ntsb_model ntsb_number year date city state country total_fatalities latimes_make latimes_model latimes_make_and_model
0 20061222X01838 BELL 407 NYC07FA048 2006 12/14/2006 00:00:00 DAGSBORO DE USA 2 BELL 407 BELL 407
1 20060817X01187 ROBINSON R22 BETA LAX06LA257 2006 08/10/2006 00:00:00 TUCSON AZ USA 1 ROBINSON R22 ROBINSON R22
2 20060111X00044 ROBINSON R44 MIA06FA039 2006 01/01/2006 00:00:00 GRAND RIDGE FL USA 3 ROBINSON R44 ROBINSON R44
3 20060419X00461 ROBINSON R44 II DFW06FA102 2006 04/13/2006 00:00:00 FREDERICKSBURG TX USA 2 ROBINSON R44 ROBINSON R44
4 20060208X00181 ROBINSON R44 SEA06LA052 2006 02/06/2006 00:00:00 HELENA MT USA 1 ROBINSON R44 ROBINSON R44

Basic arithmetic operations

You can perform mathematical operations on columns to create new computed fields:

# Calculate total people involved (fatalities + injuries)
accident_list["total_people"] = accident_list["total_fatalities"] + accident_list["total_serious_injuries"] + accident_list["total_minor_injuries"]

print("Added total_people column:")
accident_list[["total_fatalities", "total_serious_injuries", "total_minor_injuries", "total_people"]].head()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811 try:
-> 3812     return self._engine.get_loc(casted_key)
   3813 except KeyError as err:

File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'total_serious_injuries'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[2], line 2
      1 # Calculate total people involved (fatalities + injuries)
----> 2 accident_list["total_people"] = accident_list["total_fatalities"] + accident_list["total_serious_injuries"] + accident_list["total_minor_injuries"]
      4 print("Added total_people column:")
      5 accident_list[["total_fatalities", "total_serious_injuries", "total_minor_injuries", "total_people"]].head()

File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/frame.py:4107, in DataFrame.__getitem__(self, key)
   4105 if self.columns.nlevels > 1:
   4106     return self._getitem_multilevel(key)
-> 4107 indexer = self.columns.get_loc(key)
   4108 if is_integer(indexer):
   4109     indexer = [indexer]

File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'total_serious_injuries'

Conditional calculations

Use numpy.where() or boolean indexing for conditional calculations:

import numpy as np

# Create severity categories
accident_list["severity"] = np.where(
    accident_list["total_fatalities"] > 0, "Fatal",
    np.where(accident_list["total_serious_injuries"] > 0, "Serious", "Minor")
)

print("Severity categories:")
accident_list["severity"].value_counts()

Working with dates

Convert date strings to datetime objects for date calculations:

# Convert date column to datetime
accident_list["date"] = pd.to_datetime(accident_list["date"])

# Extract year, month, day
accident_list["year"] = accident_list["date"].dt.year
accident_list["month"] = accident_list["date"].dt.month
accident_list["day_of_week"] = accident_list["date"].dt.day_name()

print("Accidents by year:")
print(accident_list["year"].value_counts().sort_index())

Using apply() for complex calculations

For more complex computations, use the apply() method with custom functions:

def calculate_fatality_rate(row):
    """Calculate what percentage of people involved died"""
    if row["total_people"] == 0:
        return 0
    return (row["total_fatalities"] / row["total_people"]) * 100

accident_list["fatality_rate"] = accident_list.apply(calculate_fatality_rate, axis=1)

print("Fatality rate statistics:")
print(accident_list["fatality_rate"].describe())

Ranking and percentiles

Create rankings and percentile scores:

# Rank accidents by total people involved
accident_list["severity_rank"] = accident_list["total_people"].rank(ascending=False, method="dense")

# Show the most severe accidents
most_severe = accident_list.nsmallest(10, "severity_rank")
print("Top 10 most severe accidents by people involved:")
most_severe[["accident_number", "date", "location", "total_people", "total_fatalities", "severity_rank"]].head()

Computing new values from your data is essential for analysis. These techniques allow you to transform raw data into meaningful insights and create the specific metrics you need for your investigations.