Charts¶
Python has a number of charting tools that can work hand-in-hand with pandas. While Altair is a relatively new package compared to classics like matplotlib, it has great documentation and is easy to configure. Let’s take it for a spin.
Setup¶
First, let’s prepare our data and import the necessary libraries:
# Setup data for chart examples
import warnings
warnings.simplefilter("ignore")
import pandas as pd
# Load and prepare accident data
accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv")
accident_list["latimes_make_and_model"] = accident_list["latimes_make_and_model"].str.upper()
accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().rename("accidents").reset_index()
# Load survey data and merge
survey = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/faa-survey.csv")
survey["latimes_make_and_model"] = survey["latimes_make_and_model"].str.upper()
merged_list = pd.merge(accident_counts, survey, on="latimes_make_and_model")
# Calculate accident rates
merged_list["per_hour"] = merged_list.accidents / merged_list.total_hours
merged_list["per_100k_hours"] = (merged_list.accidents / merged_list.total_hours) * 100_000
print("Data prepared for charting")
merged_list.head()
Data prepared for charting
| latimes_make | latimes_make_and_model | accidents | total_hours | per_hour | per_100k_hours | |
|---|---|---|---|---|---|---|
| 0 | AGUSTA | AGUSTA 109 | 2 | 362172 | 5.522238e-06 | 0.552224 |
| 1 | AIRBUS | AIRBUS 130 | 1 | 1053786 | 9.489593e-07 | 0.094896 |
| 2 | AIRBUS | AIRBUS 135 | 4 | 884596 | 4.521838e-06 | 0.452184 |
| 3 | AIRBUS | AIRBUS 350 | 29 | 3883490 | 7.467510e-06 | 0.746751 |
| 4 | BELL | BELL 206 | 30 | 5501308 | 5.453249e-06 | 0.545325 |
import altair as alt
print("Altair imported for data visualization")
Altair imported for data visualization
Note
If the import triggers an error that says your notebook doesn’t have Altair, you can install it by running uv add altair in the terminal. This will download and install the library using the uv package manager.
In a typical analysis, you’d import all of your libraries in one cell at the top of the file. That way, if you need to install or make changes to the packages a notebook uses, you know where to find them and you won’t hit errors importing a package midway through running a file.
Make a basic bar chart¶
With Altair imported, we can now feed it our DataFrame to make a simple bar chart. Let’s take a look at the basic building block of an Altair chart: the Chart object. We’ll tell it that we want to create a chart from merged_list by passing the DataFrame in:
# This will show an error - Altair needs a "mark" to know how to visualize the data
alt.Chart(merged_list)
---------------------------------------------------------------------------
SchemaValidationError Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/altair/vegalite/v5/api.py:4033, in Chart.to_dict(self, validate, format, ignore, context)
4031 copy.data = core.InlineData(values=[{}])
4032 return super(Chart, copy).to_dict(**kwds)
-> 4033 return super().to_dict(**kwds)
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/altair/vegalite/v5/api.py:2004, in TopLevelMixin.to_dict(self, validate, format, ignore, context)
2001 # remaining to_dict calls are not at top level
2002 context["top_level"] = False
-> 2004 vegalite_spec: Any = _top_schema_base(super(TopLevelMixin, copy)).to_dict(
2005 validate=validate, ignore=ignore, context=dict(context, pre_transform=False)
2006 )
2008 # TODO: following entries are added after validation. Should they be validated?
2009 if is_top_level:
2010 # since this is top-level we add $schema if it's missing
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/altair/utils/schemapi.py:1169, in SchemaBase.to_dict(self, validate, ignore, context)
1167 self.validate(result)
1168 except jsonschema.ValidationError as err:
-> 1169 raise SchemaValidationError(self, err) from None
1170 return result
SchemaValidationError: '{'data': {'name': 'data-37aa3d2cc96e41928ba6304b789aa722'}}' is an invalid value.
'mark' is a required property
alt.Chart(...)
OK! We got an error, but don’t panic. The error says that Altair needs a “mark” — that is to say, it needs to know not only what data we want to visualize, but also how to represent that data visually. There are lots of different marks that Altair can use (you can check them all out here). But let’s try out the most versatile mark in our visualization toolbox: the bar.
# This will show another error - Altair needs to know which columns to use
alt.Chart(merged_list).mark_bar()
That’s an improvement, but we’ve got a new error: Altair doesn’t know which columns of our DataFrame to look at! At a minimum, we also need to define the column to use for the x- and y-axes. We can do that by chaining in the encode method.
# Basic bar chart with accident rates
alt.Chart(merged_list).mark_bar().encode(
x="latimes_make_and_model",
y="per_100k_hours"
)
That’s more like it!
Here’s an idea — maybe we do horizontal bars instead of vertical. How would you rewrite this chart code to reverse those bars?
# Horizontal bar chart
alt.Chart(merged_list).mark_bar().encode(
x="per_100k_hours",
y="latimes_make_and_model"
)
This chart is an okay start, but it’s sorted alphabetically by y-axis value, which is pretty sloppy and hard to visually parse. Let’s fix that.
We want to sort the y-axis values by their corresponding x values. We know how to do that in Pandas, but Altair has its own opinions about how to sort a DataFrame, so it will override any sort order on the DataFrame we pass in.
Sorting charts¶
Instead, we need to tell Altair how we want the axis to be organized by using the sort parameter of the Y encoding:
# Sorted horizontal bar chart
alt.Chart(merged_list).mark_bar().encode(
x="per_100k_hours",
y=alt.Y("latimes_make_and_model").sort("-x")
)
Much better! Now we can easily see which helicopter models have the highest accident rates.
Adding titles and labels¶
Let’s make this chart more presentation-ready by adding a title and better axis labels:
# Chart with title and labels
alt.Chart(merged_list).mark_bar().encode(
x=alt.X("per_100k_hours", title="Accidents per 100,000 flight hours"),
y=alt.Y("latimes_make_and_model", title="Helicopter model").sort("-x")
).properties(
title="Helicopter accident rates by model",
width=500,
height=400
)
Perfect! We now have a professional-looking chart that clearly shows helicopter accident rates by model.
This is just the beginning of what you can do with Altair. The library supports many different chart types, interactive features, and advanced styling options. You can explore more in the Altair documentation.