US Campaign Finance Panel

November 2024

A panel dataset covering 46.3 million individual contribution records from US federal election cycles. Built to demonstrate what data integrity, reproducibility, and publicly-shipped quality work look like in practice.

What it is

The dataset consolidates raw FEC contribution files across multiple election cycles into a structured, analysis-ready panel. The pipeline handles deduplication, entity normalization, and candidate-cycle linking across a dataset large enough that design decisions compound quickly.

Pipeline architecture

The ETL runs across 7 stages:

  1. Raw file acquisition and checksum validation
  2. Schema normalization across cycle-specific formats
  3. Contributor entity resolution
  4. Candidate and committee linkage
  5. Deduplication and anomaly flagging
  6. Panel construction and indexing
  7. Output validation and release packaging

Why it exists

Public campaign finance data is messy, poorly documented, and rarely packaged for serious analysis. This project treats it as a production data engineering problem — not a notebook exercise.

View on GitHub →