{ "cells": [ { "cell_type": "markdown", "id": "a0a74883", "metadata": {}, "source": [ "# Compute\n", "\n", "Computing new values from existing data is one of the most common tasks in data analysis. Whether you're calculating rates, percentages, or creating categorical variables, pandas provides powerful tools for creating new columns and transforming your data." ] }, { "cell_type": "markdown", "id": "900aa078", "metadata": {}, "source": [ "## Setup data" ] }, { "cell_type": "code", "execution_count": null, "id": "af8c39ee", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# Load and prepare the data\n", "accident_list = pd.read_csv(\"https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv\")\n", "accident_list[\"latimes_make_and_model\"] = accident_list[\"latimes_make_and_model\"].str.upper()\n", "\n", "print(f\"Loaded {len(accident_list)} accidents\")\n", "accident_list.head()" ] }, { "cell_type": "markdown", "id": "e58bb3ef", "metadata": {}, "source": [ "## Basic arithmetic operations\n", "\n", "You can perform mathematical operations on columns to create new computed fields:" ] }, { "cell_type": "code", "execution_count": null, "id": "5830c7f3", "metadata": {}, "outputs": [], "source": [ "# Calculate total people involved (fatalities + injuries)\n", "accident_list[\"total_people\"] = accident_list[\"total_fatalities\"] + accident_list[\"total_serious_injuries\"] + accident_list[\"total_minor_injuries\"]\n", "\n", "print(\"Added total_people column:\")\n", "accident_list[[\"total_fatalities\", \"total_serious_injuries\", \"total_minor_injuries\", \"total_people\"]].head()" ] }, { "cell_type": "markdown", "id": "78933f38", "metadata": {}, "source": [ "## Conditional calculations\n", "\n", "Use `numpy.where()` or boolean indexing for conditional calculations:" ] }, { "cell_type": "code", "execution_count": null, "id": "0da5a5c8", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "# Create severity categories\n", "accident_list[\"severity\"] = np.where(\n", " accident_list[\"total_fatalities\"] > 0, \"Fatal\",\n", " np.where(accident_list[\"total_serious_injuries\"] > 0, \"Serious\", \"Minor\")\n", ")\n", "\n", "print(\"Severity categories:\")\n", "accident_list[\"severity\"].value_counts()" ] }, { "cell_type": "markdown", "id": "cb134630", "metadata": {}, "source": [ "## Working with dates\n", "\n", "Convert date strings to datetime objects for date calculations:" ] }, { "cell_type": "code", "execution_count": null, "id": "7694b953", "metadata": {}, "outputs": [], "source": [ "# Convert date column to datetime\n", "accident_list[\"date\"] = pd.to_datetime(accident_list[\"date\"])\n", "\n", "# Extract year, month, day\n", "accident_list[\"year\"] = accident_list[\"date\"].dt.year\n", "accident_list[\"month\"] = accident_list[\"date\"].dt.month\n", "accident_list[\"day_of_week\"] = accident_list[\"date\"].dt.day_name()\n", "\n", "print(\"Accidents by year:\")\n", "print(accident_list[\"year\"].value_counts().sort_index())" ] }, { "cell_type": "markdown", "id": "8dc5d3d7", "metadata": {}, "source": [ "## Using apply() for complex calculations\n", "\n", "For more complex computations, use the `apply()` method with custom functions:" ] }, { "cell_type": "code", "execution_count": null, "id": "425cd40d", "metadata": {}, "outputs": [], "source": [ "def calculate_fatality_rate(row):\n", " \"\"\"Calculate what percentage of people involved died\"\"\"\n", " if row[\"total_people\"] == 0:\n", " return 0\n", " return (row[\"total_fatalities\"] / row[\"total_people\"]) * 100\n", "\n", "accident_list[\"fatality_rate\"] = accident_list.apply(calculate_fatality_rate, axis=1)\n", "\n", "print(\"Fatality rate statistics:\")\n", "print(accident_list[\"fatality_rate\"].describe())" ] }, { "cell_type": "markdown", "id": "8591fed0", "metadata": {}, "source": [ "## Ranking and percentiles\n", "\n", "Create rankings and percentile scores:" ] }, { "cell_type": "code", "execution_count": null, "id": "b5b606ed", "metadata": {}, "outputs": [], "source": [ "# Rank accidents by total people involved\n", "accident_list[\"severity_rank\"] = accident_list[\"total_people\"].rank(ascending=False, method=\"dense\")\n", "\n", "# Show the most severe accidents\n", "most_severe = accident_list.nsmallest(10, \"severity_rank\")\n", "print(\"Top 10 most severe accidents by people involved:\")\n", "most_severe[[\"accident_number\", \"date\", \"location\", \"total_people\", \"total_fatalities\", \"severity_rank\"]].head()" ] }, { "cell_type": "markdown", "id": "2f48cc5e", "metadata": {}, "source": [ "Computing new values from your data is essential for analysis. These techniques allow you to transform raw data into meaningful insights and create the specific metrics you need for your investigations." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }