US Campaign Finance Panel
A panel dataset covering 46.3 million individual contribution records from US federal election cycles. Built to demonstrate what data integrity, reproducibility, and publicly-shipped quality work look like in practice.
What it is
The dataset consolidates raw FEC contribution files across multiple election cycles into a structured, analysis-ready panel. The pipeline handles deduplication, entity normalization, and candidate-cycle linking across a dataset large enough that design decisions compound quickly.
Pipeline architecture
The ETL runs across 7 stages:
- Raw file acquisition and checksum validation
- Schema normalization across cycle-specific formats
- Contributor entity resolution
- Candidate and committee linkage
- Deduplication and anomaly flagging
- Panel construction and indexing
- Output validation and release packaging
Why it exists
Public campaign finance data is messy, poorly documented, and rarely packaged for serious analysis. This project treats it as a production data engineering problem — not a notebook exercise.