{ "cells": [ { "cell_type": "markdown", "id": "237ce6a6", "metadata": {}, "source": [ "# Charts\n", "\n", "Python has a number of charting tools that can work hand-in-hand with pandas. While [Altair](https://altair-viz.github.io/) is a relatively new package compared to classics like [matplotlib](https://matplotlib.org/), it has great documentation and is easy to configure. Let's take it for a spin." ] }, { "cell_type": "markdown", "id": "8fe0b616", "metadata": {}, "source": [ "## Setup\n", "\n", "First, let's prepare our data and import the necessary libraries:" ] }, { "cell_type": "code", "execution_count": null, "id": "1d42ff09", "metadata": {}, "outputs": [], "source": [ "# Setup data for chart examples\n", "import warnings\n", "warnings.simplefilter(\"ignore\")\n", "import pandas as pd\n", "\n", "# Load and prepare accident data\n", "accident_list = pd.read_csv(\"https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv\")\n", "accident_list[\"latimes_make_and_model\"] = accident_list[\"latimes_make_and_model\"].str.upper()\n", "accident_counts = accident_list.groupby([\"latimes_make\", \"latimes_make_and_model\"]).size().rename(\"accidents\").reset_index()\n", "\n", "# Load survey data and merge\n", "survey = pd.read_csv(\"https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/faa-survey.csv\")\n", "survey[\"latimes_make_and_model\"] = survey[\"latimes_make_and_model\"].str.upper()\n", "merged_list = pd.merge(accident_counts, survey, on=\"latimes_make_and_model\")\n", "\n", "# Calculate accident rates\n", "merged_list[\"per_hour\"] = merged_list.accidents / merged_list.total_hours\n", "merged_list[\"per_100k_hours\"] = (merged_list.accidents / merged_list.total_hours) * 100_000\n", "\n", "print(\"Data prepared for charting\")\n", "merged_list.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "9520676d", "metadata": {}, "outputs": [], "source": [ "import altair as alt\n", "print(\"Altair imported for data visualization\")" ] }, { "cell_type": "markdown", "id": "4f8d813d", "metadata": {}, "source": [ "```{note}\n", "If the import triggers an error that says your notebook doesn't have Altair, you can install it by running `uv add altair` in the terminal. This will download and install the library using the uv package manager.\n", "```\n", "\n", "In a typical analysis, you'd import all of your libraries in one cell at the top of the file. That way, if you need to install or make changes to the packages a notebook uses, you know where to find them and you won't hit errors importing a package midway through running a file." ] }, { "cell_type": "markdown", "id": "21a97902", "metadata": {}, "source": [ "## Make a basic bar chart\n", "\n", "With Altair imported, we can now feed it our DataFrame to make a simple bar chart. Let's take a look at the basic building block of an Altair chart: the `Chart` object. We'll tell it that we want to create a chart from `merged_list` by passing the DataFrame in:" ] }, { "cell_type": "code", "execution_count": null, "id": "879f151f", "metadata": {}, "outputs": [], "source": [ "# This will show an error - Altair needs a \"mark\" to know how to visualize the data\n", "alt.Chart(merged_list)" ] }, { "cell_type": "markdown", "id": "51ee57dc", "metadata": {}, "source": [ "OK! We got an error, but don't panic. The error says that Altair needs a \"mark\" — that is to say, it needs to know not only what data we want to visualize, but also _how_ to represent that data visually. There are lots of different marks that Altair can use (you can [check them all out here](https://altair-viz.github.io/user_guide/marks.html)). But let's try out the most versatile mark in our visualization toolbox: the bar." ] }, { "cell_type": "code", "execution_count": null, "id": "844c64a3", "metadata": {}, "outputs": [], "source": [ "# This will show another error - Altair needs to know which columns to use\n", "alt.Chart(merged_list).mark_bar()" ] }, { "cell_type": "markdown", "id": "8690a3bf", "metadata": {}, "source": [ "That's an improvement, but we've got a new error: Altair doesn't know which columns of our DataFrame to look at! At a minimum, we also need to define the column to use for the x- and y-axes. We can do that by chaining in the `encode` method." ] }, { "cell_type": "code", "execution_count": null, "id": "f5a8728b", "metadata": {}, "outputs": [], "source": [ "# Basic bar chart with accident rates\n", "alt.Chart(merged_list).mark_bar().encode(\n", " x=\"latimes_make_and_model\",\n", " y=\"per_100k_hours\"\n", ")" ] }, { "cell_type": "markdown", "id": "b9a2dc68", "metadata": {}, "source": [ "That's more like it!\n", "\n", "Here's an idea — maybe we do horizontal bars instead of vertical. How would you rewrite this chart code to reverse those bars?" ] }, { "cell_type": "code", "execution_count": null, "id": "5d698605", "metadata": {}, "outputs": [], "source": [ "# Horizontal bar chart\n", "alt.Chart(merged_list).mark_bar().encode(\n", " x=\"per_100k_hours\",\n", " y=\"latimes_make_and_model\"\n", ")" ] }, { "cell_type": "markdown", "id": "4b145a91", "metadata": {}, "source": [ "This chart is an okay start, but it's sorted alphabetically by y-axis value, which is pretty sloppy and hard to visually parse. Let's fix that.\n", "\n", "We want to sort the y-axis values by their corresponding x values. We know how to do that in Pandas, but Altair has its own opinions about how to sort a DataFrame, so it will override any sort order on the DataFrame we pass in." ] }, { "cell_type": "markdown", "id": "e9eb6466", "metadata": {}, "source": [ "## Sorting charts\n", "\n", "Instead, we need to tell Altair how we want the axis to be organized by using the `sort` parameter of the `Y` encoding:" ] }, { "cell_type": "code", "execution_count": null, "id": "9f4715cd", "metadata": {}, "outputs": [], "source": [ "# Sorted horizontal bar chart\n", "alt.Chart(merged_list).mark_bar().encode(\n", " x=\"per_100k_hours\",\n", " y=alt.Y(\"latimes_make_and_model\").sort(\"-x\")\n", ")" ] }, { "cell_type": "markdown", "id": "f0b923dc", "metadata": {}, "source": [ "Much better! Now we can easily see which helicopter models have the highest accident rates.\n", "\n", "## Adding titles and labels\n", "\n", "Let's make this chart more presentation-ready by adding a title and better axis labels:" ] }, { "cell_type": "code", "execution_count": null, "id": "f607cb18", "metadata": {}, "outputs": [], "source": [ "# Chart with title and labels\n", "alt.Chart(merged_list).mark_bar().encode(\n", " x=alt.X(\"per_100k_hours\", title=\"Accidents per 100,000 flight hours\"),\n", " y=alt.Y(\"latimes_make_and_model\", title=\"Helicopter model\").sort(\"-x\")\n", ").properties(\n", " title=\"Helicopter accident rates by model\",\n", " width=500,\n", " height=400\n", ")" ] }, { "cell_type": "markdown", "id": "1ec5a0d4", "metadata": {}, "source": [ "Perfect! We now have a professional-looking chart that clearly shows helicopter accident rates by model.\n", "\n", "This is just the beginning of what you can do with Altair. The library supports many different chart types, interactive features, and advanced styling options. You can explore more in the [Altair documentation](https://altair-viz.github.io/)." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }