{ "cells": [ { "cell_type": "markdown", "id": "47b44f67", "metadata": {}, "source": [ "# Export\n", "\n", "After analyzing your data, you'll often want to export your results for use in other applications, share with colleagues, or archive your findings. Pandas provides several methods for exporting DataFrames to different file formats." ] }, { "cell_type": "markdown", "id": "b51bdca7", "metadata": {}, "source": [ "## Setup: Prepare analysis results\n", "\n", "Let's create some analysis results that we can then export:" ] }, { "cell_type": "code", "execution_count": null, "id": "3af2c166", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# Load and prepare the data\n", "accident_list = pd.read_csv(\"https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/ntsb-accidents.csv\")\n", "accident_list[\"latimes_make_and_model\"] = accident_list[\"latimes_make_and_model\"].str.upper()\n", "\n", "# Create accident counts by model\n", "accident_counts = accident_list.groupby([\"latimes_make\", \"latimes_make_and_model\"]).size().rename(\"accidents\").reset_index()\n", "\n", "# Load survey data and merge\n", "survey = pd.read_csv(\"https://raw.githubusercontent.com/palewire/first-python-notebook/main/docs/src/_static/faa-survey.csv\")\n", "survey[\"latimes_make_and_model\"] = survey[\"latimes_make_and_model\"].str.upper()\n", "merged_list = pd.merge(accident_counts, survey, on=\"latimes_make_and_model\")\n", "\n", "# Calculate accident rates\n", "merged_list[\"per_100k_hours\"] = (merged_list[\"accidents\"] / merged_list[\"total_hours\"]) * 100_000\n", "\n", "# Sort by accident rate\n", "final_analysis = merged_list.sort_values(\"per_100k_hours\", ascending=False)\n", "\n", "print(f\"Analysis complete: {len(final_analysis)} helicopter models\")\n", "final_analysis.head()" ] }, { "cell_type": "markdown", "id": "bed7d062", "metadata": {}, "source": [ "## Export to CSV\n", "\n", "CSV is the most common format for sharing data:" ] }, { "cell_type": "code", "execution_count": null, "id": "6d7d3d45", "metadata": {}, "outputs": [], "source": [ "# Export to CSV\n", "final_analysis.to_csv(\"helicopter_accident_rates.csv\", index=False)\n", "print(\"Exported to helicopter_accident_rates.csv\")\n", "\n", "# Export with custom options\n", "final_analysis.to_csv(\n", " \"helicopter_analysis_detailed.csv\",\n", " index=False,\n", " float_format='%.2f', # Round floats to 2 decimal places\n", " encoding='utf-8'\n", ")\n", "print(\"Exported detailed version with formatting\")" ] }, { "cell_type": "markdown", "id": "1e4990fd", "metadata": {}, "source": [ "## Export to Excel\n", "\n", "Excel format is useful for sharing with non-technical stakeholders:" ] }, { "cell_type": "code", "execution_count": null, "id": "14e49356", "metadata": {}, "outputs": [], "source": [ "# Export to Excel (requires openpyxl or xlsxwriter)\n", "try:\n", " final_analysis.to_excel(\"helicopter_accident_rates.xlsx\", index=False, sheet_name=\"Accident Rates\")\n", " print(\"Exported to Excel: helicopter_accident_rates.xlsx\")\n", "except ImportError:\n", " print(\"Excel export requires openpyxl. Install with: uv add openpyxl\")" ] }, { "cell_type": "markdown", "id": "67861e26", "metadata": {}, "source": [ "## Export multiple sheets to Excel\n", "\n", "You can create an Excel file with multiple worksheets:" ] }, { "cell_type": "code", "execution_count": null, "id": "82ab1347", "metadata": {}, "outputs": [], "source": [ "# Create summary by manufacturer\n", "by_manufacturer = accident_list.groupby(\"latimes_make\").agg({\n", " \"accident_number\": \"count\",\n", " \"total_fatalities\": \"sum\",\n", " \"total_serious_injuries\": \"sum\"\n", "}).rename(columns={\"accident_number\": \"total_accidents\"})\n", "\n", "try:\n", " # Export multiple sheets\n", " with pd.ExcelWriter(\"helicopter_analysis_complete.xlsx\") as writer:\n", " final_analysis.to_excel(writer, sheet_name=\"Accident Rates\", index=False)\n", " by_manufacturer.to_excel(writer, sheet_name=\"By Manufacturer\")\n", " accident_list.head(100).to_excel(writer, sheet_name=\"Sample Data\", index=False)\n", " \n", " print(\"Exported multi-sheet Excel file: helicopter_analysis_complete.xlsx\")\n", "except ImportError:\n", " print(\"Excel export requires openpyxl. Install with: uv add openpyxl\")" ] }, { "cell_type": "markdown", "id": "573c3219", "metadata": {}, "source": [ "## Export to JSON\n", "\n", "JSON format is useful for web applications and APIs:" ] }, { "cell_type": "code", "execution_count": null, "id": "33d8acbd", "metadata": {}, "outputs": [], "source": [ "# Export to JSON\n", "final_analysis.to_json(\"helicopter_accident_rates.json\", orient=\"records\", indent=2)\n", "print(\"Exported to JSON: helicopter_accident_rates.json\")\n", "\n", "# Show a sample of the JSON structure\n", "sample_json = final_analysis.head(3).to_json(orient=\"records\", indent=2)\n", "print(\"\\nSample JSON structure:\")\n", "print(sample_json)" ] }, { "cell_type": "markdown", "id": "b90c03fe", "metadata": {}, "source": [ "## Export subsets of data\n", "\n", "Often you want to export only specific columns or filtered data:" ] }, { "cell_type": "code", "execution_count": null, "id": "caf47de5", "metadata": {}, "outputs": [], "source": [ "# Export only key columns\n", "summary_columns = [\"latimes_make_and_model\", \"accidents\", \"total_hours\", \"per_100k_hours\"]\n", "final_analysis[summary_columns].to_csv(\"helicopter_summary.csv\", index=False)\n", "print(\"Exported summary with key columns only\")\n", "\n", "# Export top 10 most dangerous models\n", "top_10_dangerous = final_analysis.head(10)\n", "top_10_dangerous.to_csv(\"top_10_dangerous_helicopters.csv\", index=False)\n", "print(\"Exported top 10 most dangerous helicopter models\")" ] }, { "cell_type": "markdown", "id": "b6b72703", "metadata": {}, "source": [ "## Export with custom formatting\n", "\n", "You can customize the output format for better presentation:" ] }, { "cell_type": "code", "execution_count": null, "id": "53949709", "metadata": {}, "outputs": [], "source": [ "# Create a formatted version for presentation\n", "presentation_data = final_analysis.copy()\n", "presentation_data[\"accident_rate_formatted\"] = presentation_data[\"per_100k_hours\"].round(2).astype(str) + \" per 100k hours\"\n", "presentation_data[\"total_hours_formatted\"] = presentation_data[\"total_hours\"].apply(lambda x: f\"{x:,}\")\n", "\n", "# Export formatted version\n", "presentation_columns = [\"latimes_make_and_model\", \"accidents\", \"total_hours_formatted\", \"accident_rate_formatted\"]\n", "presentation_data[presentation_columns].to_csv(\n", " \"helicopter_analysis_formatted.csv\", \n", " index=False,\n", " columns=presentation_columns\n", ")\n", "print(\"Exported formatted version for presentation\")" ] }, { "cell_type": "markdown", "id": "4bdadaec", "metadata": {}, "source": [ "## Verify your exports\n", "\n", "It's good practice to verify that your exports worked correctly:" ] }, { "cell_type": "code", "execution_count": null, "id": "36bc14fc", "metadata": {}, "outputs": [], "source": [ "# Read back the CSV to verify\n", "verification = pd.read_csv(\"helicopter_accident_rates.csv\")\n", "print(f\"Verification: CSV contains {len(verification)} rows and {len(verification.columns)} columns\")\n", "print(\"\\nFirst few rows of exported data:\")\n", "print(verification.head())" ] }, { "cell_type": "markdown", "id": "c65e39a4", "metadata": {}, "source": [ "## Export tips and best practices\n", "\n", "Here are some important considerations when exporting data:" ] }, { "cell_type": "code", "execution_count": null, "id": "45b5051e", "metadata": {}, "outputs": [], "source": [ "print(\"Export Best Practices:\")\n", "print(\"1. Always use index=False for CSV unless you need the index\")\n", "print(\"2. Specify encoding='utf-8' for international characters\")\n", "print(\"3. Use descriptive filenames with dates if appropriate\")\n", "print(\"4. Consider your audience when choosing formats\")\n", "print(\"5. Verify exports by reading them back\")\n", "print(\"6. Round floating point numbers to appropriate precision\")\n", "print(\"7. Include metadata or documentation files when sharing\")\n", "\n", "# Example of filename with timestamp\n", "from datetime import datetime\n", "timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n", "timestamped_filename = f\"helicopter_analysis_{timestamp}.csv\"\n", "print(f\"\\nExample timestamped filename: {timestamped_filename}\")" ] }, { "cell_type": "markdown", "id": "cadaf812", "metadata": {}, "source": [ "Exporting your analysis results is the final step in most data journalism projects. Choose the right format for your audience and always verify that your exports contain the data you expect. This ensures your hard work analyzing the data can be easily shared and used by others." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }