Export¶
After analyzing your data, you’ll often want to export your results for use in other applications, share with colleagues, or archive your findings. Pandas provides several methods for exporting DataFrames to different file formats.
Setup: Prepare analysis results¶
Let’s create some analysis results that we can then export:
import pandas as pd
# Load and prepare the data
accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv")
accident_list["latimes_make_and_model"] = accident_list["latimes_make_and_model"].str.upper()
# Create accident counts by model
accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().rename("accidents").reset_index()
# Load survey data and merge
survey = pd.read_csv("https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/faa-survey.csv")
survey["latimes_make_and_model"] = survey["latimes_make_and_model"].str.upper()
merged_list = pd.merge(accident_counts, survey, on="latimes_make_and_model")
# Calculate accident rates
merged_list["per_100k_hours"] = (merged_list["accidents"] / merged_list["total_hours"]) * 100_000
# Sort by accident rate
final_analysis = merged_list.sort_values("per_100k_hours", ascending=False)
print(f"Analysis complete: {len(final_analysis)} helicopter models")
final_analysis.head()
Analysis complete: 12 helicopter models
| latimes_make | latimes_make_and_model | accidents | total_hours | per_100k_hours | |
|---|---|---|---|---|---|
| 9 | ROBINSON | ROBINSON R44 | 38 | 2359729 | 1.610354 |
| 7 | MCDONNELL DOUGLAS | MCDONNELL DOUGLAS 369 | 6 | 550689 | 1.089544 |
| 6 | HUGHES | HUGHES 369 | 13 | 1201688 | 1.081812 |
| 3 | AIRBUS | AIRBUS 350 | 29 | 3883490 | 0.746751 |
| 8 | ROBINSON | ROBINSON R22 | 20 | 2970806 | 0.673218 |
Export to CSV¶
CSV is the most common format for sharing data:
# Export to CSV
final_analysis.to_csv("helicopter_accident_rates.csv", index=False)
print("Exported to helicopter_accident_rates.csv")
# Export with custom options
final_analysis.to_csv(
"helicopter_analysis_detailed.csv",
index=False,
float_format='%.2f', # Round floats to 2 decimal places
encoding='utf-8'
)
print("Exported detailed version with formatting")
Exported to helicopter_accident_rates.csv
Exported detailed version with formatting
Export to Excel¶
Excel format is useful for sharing with non-technical stakeholders:
# Export to Excel (requires openpyxl or xlsxwriter)
try:
final_analysis.to_excel("helicopter_accident_rates.xlsx", index=False, sheet_name="Accident Rates")
print("Exported to Excel: helicopter_accident_rates.xlsx")
except ImportError:
print("Excel export requires openpyxl. Install with: uv add openpyxl")
Excel export requires openpyxl. Install with: uv add openpyxl
Export multiple sheets to Excel¶
You can create an Excel file with multiple worksheets:
# Create summary by manufacturer
by_manufacturer = accident_list.groupby("latimes_make").agg({
"accident_number": "count",
"total_fatalities": "sum",
"total_serious_injuries": "sum"
}).rename(columns={"accident_number": "total_accidents"})
try:
# Export multiple sheets
with pd.ExcelWriter("helicopter_analysis_complete.xlsx") as writer:
final_analysis.to_excel(writer, sheet_name="Accident Rates", index=False)
by_manufacturer.to_excel(writer, sheet_name="By Manufacturer")
accident_list.head(100).to_excel(writer, sheet_name="Sample Data", index=False)
print("Exported multi-sheet Excel file: helicopter_analysis_complete.xlsx")
except ImportError:
print("Excel export requires openpyxl. Install with: uv add openpyxl")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[4], line 2
1 # Create summary by manufacturer
----> 2 by_manufacturer = accident_list.groupby("latimes_make").agg({
3 "accident_number": "count",
4 "total_fatalities": "sum",
5 "total_serious_injuries": "sum"
6 }).rename(columns={"accident_number": "total_accidents"})
8 try:
9 # Export multiple sheets
10 with pd.ExcelWriter("helicopter_analysis_complete.xlsx") as writer:
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/groupby/generic.py:1432, in DataFrameGroupBy.aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
1429 kwargs["engine_kwargs"] = engine_kwargs
1431 op = GroupByApply(self, func, args=args, kwargs=kwargs)
-> 1432 result = op.agg()
1433 if not is_dict_like(func) and result is not None:
1434 # GH #52849
1435 if not self.as_index and is_list_like(func):
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/apply.py:190, in Apply.agg(self)
187 return self.apply_str()
189 if is_dict_like(func):
--> 190 return self.agg_dict_like()
191 elif is_list_like(func):
192 # we require a list, but not a 'str'
193 return self.agg_list_like()
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/apply.py:423, in Apply.agg_dict_like(self)
415 def agg_dict_like(self) -> DataFrame | Series:
416 """
417 Compute aggregation in the case of a dict-like argument.
418
(...) 421 Result of aggregation.
422 """
--> 423 return self.agg_or_apply_dict_like(op_name="agg")
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/apply.py:1603, in GroupByApply.agg_or_apply_dict_like(self, op_name)
1598 kwargs.update({"engine": engine, "engine_kwargs": engine_kwargs})
1600 with com.temp_setattr(
1601 obj, "as_index", True, condition=hasattr(obj, "as_index")
1602 ):
-> 1603 result_index, result_data = self.compute_dict_like(
1604 op_name, selected_obj, selection, kwargs
1605 )
1606 result = self.wrap_results_dict_like(selected_obj, result_index, result_data)
1607 return result
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/apply.py:462, in Apply.compute_dict_like(self, op_name, selected_obj, selection, kwargs)
460 is_groupby = isinstance(obj, (DataFrameGroupBy, SeriesGroupBy))
461 func = cast(AggFuncTypeDict, self.func)
--> 462 func = self.normalize_dictlike_arg(op_name, selected_obj, func)
464 is_non_unique_col = (
465 selected_obj.ndim == 2
466 and selected_obj.columns.nunique() < len(selected_obj.columns)
467 )
469 if selected_obj.ndim == 1:
470 # key only used for output
File ~/checkouts/readthedocs.org/user_builds/first-python-notebook-vscode/envs/latest/lib/python3.13/site-packages/pandas/core/apply.py:663, in Apply.normalize_dictlike_arg(self, how, obj, func)
661 cols = Index(list(func.keys())).difference(obj.columns, sort=True)
662 if len(cols) > 0:
--> 663 raise KeyError(f"Column(s) {list(cols)} do not exist")
665 aggregator_types = (list, tuple, dict)
667 # if we have a dict of any non-scalars
668 # eg. {'A' : ['mean']}, normalize all to
669 # be list-likes
670 # Cannot use func.values() because arg may be a Series
KeyError: "Column(s) ['accident_number', 'total_serious_injuries'] do not exist"
Export to JSON¶
JSON format is useful for web applications and APIs:
# Export to JSON
final_analysis.to_json("helicopter_accident_rates.json", orient="records", indent=2)
print("Exported to JSON: helicopter_accident_rates.json")
# Show a sample of the JSON structure
sample_json = final_analysis.head(3).to_json(orient="records", indent=2)
print("\nSample JSON structure:")
print(sample_json)
Export subsets of data¶
Often you want to export only specific columns or filtered data:
# Export only key columns
summary_columns = ["latimes_make_and_model", "accidents", "total_hours", "per_100k_hours"]
final_analysis[summary_columns].to_csv("helicopter_summary.csv", index=False)
print("Exported summary with key columns only")
# Export top 10 most dangerous models
top_10_dangerous = final_analysis.head(10)
top_10_dangerous.to_csv("top_10_dangerous_helicopters.csv", index=False)
print("Exported top 10 most dangerous helicopter models")
Export with custom formatting¶
You can customize the output format for better presentation:
# Create a formatted version for presentation
presentation_data = final_analysis.copy()
presentation_data["accident_rate_formatted"] = presentation_data["per_100k_hours"].round(2).astype(str) + " per 100k hours"
presentation_data["total_hours_formatted"] = presentation_data["total_hours"].apply(lambda x: f"{x:,}")
# Export formatted version
presentation_columns = ["latimes_make_and_model", "accidents", "total_hours_formatted", "accident_rate_formatted"]
presentation_data[presentation_columns].to_csv(
"helicopter_analysis_formatted.csv",
index=False,
columns=presentation_columns
)
print("Exported formatted version for presentation")
Verify your exports¶
It’s good practice to verify that your exports worked correctly:
# Read back the CSV to verify
verification = pd.read_csv("helicopter_accident_rates.csv")
print(f"Verification: CSV contains {len(verification)} rows and {len(verification.columns)} columns")
print("\nFirst few rows of exported data:")
print(verification.head())
Export tips and best practices¶
Here are some important considerations when exporting data:
print("Export Best Practices:")
print("1. Always use index=False for CSV unless you need the index")
print("2. Specify encoding='utf-8' for international characters")
print("3. Use descriptive filenames with dates if appropriate")
print("4. Consider your audience when choosing formats")
print("5. Verify exports by reading them back")
print("6. Round floating point numbers to appropriate precision")
print("7. Include metadata or documentation files when sharing")
# Example of filename with timestamp
from datetime import datetime
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
timestamped_filename = f"helicopter_analysis_{timestamp}.csv"
print(f"\nExample timestamped filename: {timestamped_filename}")
Exporting your analysis results is the final step in most data journalism projects. Choose the right format for your audience and always verify that your exports contain the data you expect. This ensures your hard work analyzing the data can be easily shared and used by others.