Pandas

Python is filled with functions to do pretty much anything you’d ever want to do with a programming language: navigate the web, parse data, interact with a database, run fancy statistics, build a pretty website and so much more.

Creative people have put these tools to work to get a wide range of things done in the academy, the laboratory and even in outer space. Some are included in a toolbox that comes with the language, known as the standard library. Others have been built by members of Python’s developer community and need to be downloaded and installed from the web.

One third-party tool that’s important for this class is called pandas. It was invented for use at a financial investment firm and has become the leading open-source library for accessing and analyzing data in many different fields.

Import pandas

Add the following to your notebook. Type in the following and run it:

import pandas

If nothing happens, that’s good. It means you have pandas installed and ready to use.

Note

Since pandas is created by a third party independent from the core Python developers, it wouldn’t be installed by default in a basic Python installation.

"If you followed the VS Code setup chapter and used uv to set up your project, pandas should already be installed. If your Python environment doesn't have pandas, you can install it by opening the VS Code terminal (`View > Terminal`) and running `uv add pandas`.\n",

Return to your import section and rewrite it like this:

import pandas as pd

This renames the pandas library to pd, which is a common shortcut you’ll frequently see in the wild. Think of it as a nickname.

Create a Series

Now let’s start to get a look at pandas’ powers. Let’s begin by making a simple list of numbers:

my_list = [1, 2, 3, 4, 5]
print(my_list)
[1, 2, 3, 4, 5]

We can start to get a look at its powers by converting that plain Python list into what pandas calls a Series. Here’s how to make it happen. Let’s stick with simple variables and name it my_series.

my_series = pd.Series(my_list)
print(my_series)
0    1
1    2
2    3
3    4
4    5
dtype: int64

Voila! Your first pandas object. A Series is a data type provided by pandas. For our purposes you can consider it an enhanced version of a Python list.

Compare the output above to what we saw when we printed a list. Here, we can see that each item in the list now has an index number (0, 1, 2, 3, 4) and a dtype, which stands for data type.

Like many things you’ll encounter in Python, a pandas Series contains numerous functions and methods that are packaged together with the data. Let’s try a few.

How about sorting the series from highest to lowest with the sort_values method?

my_series.sort_values(ascending=False)
4    5
3    4
2    3
1    2
0    1
dtype: int64

Or how about a statistical summary of the data with describe?

my_series.describe()
count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64

In just a few lines of code, pandas was able to calculate the count, mean, standard deviation, minimum, maximum and quartiles of our data, which is something that would take quite a bit more work if we had to calculate it all by hand.

But this is just the beginning.

Next you’ll learn how to import larger, more realistic datasets for analysis.