Data Methodology
Data you can trust.
Methodology you can verify.
Every project in FFID comes from official congressional documents. Here is exactly how we collect, process, and maintain the data — including what we cover, what we do not, and where the known gaps are.
97,761+
Total projects indexed
FY2022–FY2026
5 fiscal years
2 chambers
House CPF + Senate CDS
Every Monday
Update cadence
Overview
What are congressional earmarks?
Congressional earmarks are provisions in federal appropriations legislation that direct a specific sum of money to a named recipient for a named project — bypassing the normal competitive grant process. A member of Congress sponsors the project and writes it directly into the appropriations bill. The two active programs are Community Project Funding (CPF) in the House and Congressionally Directed Spending (CDS) in the Senate, both restored in FY2022 after a decade-long ban.
Because recipients are named in legislation — not selected through an agency RFP process — earmarks give AEC firms, lobbyists, grant consultants, and business development teams a 12–18 month window before funded projects appear on SAM.gov. FFID tracks all 97,761 CPF and CDS projects from FY2022 through FY2026.
Full earmark FAQ →Coverage Scope
What FFID tracks — and what it does not
FFID is purpose-built for congressional earmark intelligence. That focus is what makes it useful — and what defines its limits. Understanding both helps you decide whether it fits your workflow.
In scope
- ✓House Community Project Funding (CPF) — FY2022–FY2024 + FY2026 enacted; FY2025 requested (excluded by P.L. 119-4)
- ✓Senate Congressionally Directed Spending (CDS) — FY2022–FY2024 enacted; FY2025–FY2026 requested pipeline
- ✓Project-level records: recipient, member sponsor, dollar amount, state, agency, and status
- ✓Member sponsorship history across fiscal years
- ✓Recipient funding history — normalized across chambers and fiscal years
- ✓Requested vs. enacted lifecycle tracking
- ✓USASpending.gov award match links (where available)
Not in scope
- —Competitive federal grants (RAISE, INFRA, BRIC, formula grants, NSF, NIH awards)
- —Agency discretionary awards not directed by Congress
- —Contract procurement records (use SAM.gov for active solicitations)
- —Loan programs, formula funding, or block grants
- —Congressional staffer or lobbyist contact information
- —FY2027 data (not yet available — will be added as Congress acts)
CPF and CDS earmarks go directly to specific recipients — no competitive application. That is what makes them different from Grants.gov, and why FFID is a complement to competitive grant tools, not a replacement.
Status Lifecycle
Requested vs. enacted — what the status means
Understanding the difference between requested and enacted status is essential for interpreting FFID data correctly. Both statuses are useful — they answer different questions.
Congressional intent — before appropriations finalize
A member of Congress formally submitted a CPF or CDS request for a specific recipient and dollar amount. The request is in the public record. Appropriations have not yet been finalized — the funding is not confirmed.
- →Use for: identifying demand, tracking member priorities, monitoring the forward pipeline
- →Interpret as: a signal of intent, not a confirmed award
- →Available for: Senate CDS (both chambers publish request data; House CPF does not have a public requested dataset)
Confirmed by law — appropriations bill signed
The appropriations bill containing this project was signed into law. The funding is confirmed and directed to the specific recipient. The project now has a legal basis for proceeding toward award or procurement.
- →Use for: confirmed funding signals, recipient intelligence, BD pipeline building
- →Interpret as: confirmed congressional direction — funding is appropriated
- →Available for: House CPF FY2022–FY2024, FY2026; Senate CDS FY2022–FY2024
The enactment rate: what the gap means
Not all requested projects are enacted. Based on historical Senate CDS data, approximately 20% of requested dollar amounts become enacted. This is not a data quality issue — it reflects how the appropriations process works. Congress receives far more requests than it can fund in a given year.
The requested pipeline is a leading signal: it shows where members and recipients are actively trying to direct federal funding. Teams that track requested data alongside enacted records can identify funded intent earlier — before the appropriations outcome is known.
Sources
Where the data comes from
All data is sourced directly from official congressional documents. No third-party aggregators, no scraped news articles.
House Appropriations Committee
EnactedOfficial Community Project Funding (CPF) tables published with each annual appropriations bill. Enacted data available for FY2022–FY2024 and FY2026. FY2025 records appear as requested — P.L. 119-4 (SEC. 1111) excluded all CPF from FY2025 appropriations.
Senate Appropriations Committee
Enacted + RequestedCongressionally Directed Spending (CDS) tables published in joint explanatory statements. Enacted data through FY2024; requested data available for FY2025–FY2026.
Joint Explanatory Statements
SupplementalConference reports and joint statements that reconcile House and Senate versions of appropriations bills. Used to confirm enacted status and link projects to official appropriations records. JES documents are dense legislative PDFs — recipient names and amounts in the JES may be abbreviated or aggregated differently than in the originating committee table. FFID uses this source as supplemental confirmation; the committee table remains the authoritative record for individual project details.
USASpending.gov (award matching)
Award confirmationWhere available, FFID links enacted projects to confirmed USASpending.gov awards — providing a direct connection from congressional direction to federal award record.
Normalization
How raw congressional data becomes searchable intelligence
Congressional documents are not designed for cross-year search or business intelligence. FFID applies normalization to make the data consistent, searchable, and comparable across chambers, fiscal years, and records.
Recipient normalization
Organization names are standardized — legal suffixes stripped (Inc., LLC, Corp.), abbreviation variants resolved, name formatting made consistent. This enables a single recipient to have a searchable funding history across multiple projects, years, and both chambers.
Member normalization
Sponsor names are linked to congressional member records and matched across fiscal years, name format changes, and both chambers. A member's complete CPF and CDS sponsorship history is accessible under one profile regardless of how their name appears in source documents.
Agency taxonomy
Projects are tagged to one of 30 agency codes (DOT, EPA, HUD, USACE, etc.) using a 3-level taxonomy: agency → category → use case. This allows filtering by agency, by sector (Transportation, Water, Housing), or by project type (road rehabilitation, wastewater treatment, affordable housing).
Geographic normalization
State codes, congressional district identifiers, and city/county references are standardized. State-level pages aggregate all projects for a given state across both chambers and all fiscal years. Filtering by state is consistent regardless of how the source document listed the location.
Status normalization
Each project is assigned a canonical status — requested or enacted — based on which document it came from and whether the appropriations bill was signed into law. Status is not inferred or estimated; it reflects the official source.
Deduplication
Cryptographic import hashes prevent double-counting across pipeline runs, source document updates, and re-processed records. Projects that appear in both chamber documents for the same fiscal year are tracked with their origin chamber clearly identified.
Pipeline
How the data pipeline works
Collect
Source data directly from official congressional documents — committee tables, joint explanatory statements, and appropriations reports published by the House and Senate Appropriations Committees.
Extract
Parse project titles, recipient names, dollar amounts, states, congressional districts, agencies, and fiscal years from structured and unstructured legislative text. Each document is archived to immutable cloud storage before parsing.
Normalize
Standardize recipient names, geographic identifiers, and agency codes for consistent cross-year search and export. Resolve organization name variants and strip legal suffixes to produce a single searchable identity per recipient.
Validate
Automated validation gate checks error rates, row counts, and floor thresholds before any data is committed. Pipeline exits with an error if checks fail — no bad data reaches the database.
Update
Pipeline runs every Monday to ingest newly published data. New fiscal year data is added as Congress acts and official documents become available.
Quality Checks
How we ensure accuracy
Source archiving
Every source document is fetched from official committee sources and archived to immutable cloud storage before parsing. The raw source is always preserved.
Deduplication
Cryptographic import hashes prevent double-counting across chambers, fiscal years, and repeated pipeline runs. Each project has a stable canonical identity.
Automated validation
Each pipeline run passes an automated gate — error rate, row count, and floor threshold checks — before data is committed. No data reaches the database if the gate fails.
Updates
When data is updated
Weekly — every Monday
Automated pipeline ingests newly published congressional documents and adds new projects. A "Data updated" timestamp on the search page reflects the most recent successful run.
Annual — as Congress acts
New fiscal year data is added as appropriations bills are enacted and official documents are published. Senate CDS requested data is added when the committee publishes its request tables.
Known Limitations
Where the data has gaps
Transparency about limitations builds trust, not skepticism. These are the gaps we know about and are working to address.
Senate CDS FY2025–FY2026
Senate CDS enacted data is currently available through FY2024. FY2025 and FY2026 Senate records in the database are requested (pipeline) data, not confirmed enacted funding. Enacted data will be added when the appropriations bills are signed and official documents published.
House CPF has no public requested data
Unlike the Senate, the House does not publish a comprehensive public requested dataset. FY2022–FY2024 and FY2026 House CPF records are enacted. FY2025 is an exception: P.L. 119-4 (SEC. 1111, signed Mar 15 2025) excluded all CPF from FY2025 appropriations, so those 5,157 records appear as requested even though they originated from House-passed appropriations bills.
Recipient coverage is partial
Recipient normalization and UEI matching is ongoing. Some organizations appear under multiple name variants. We are continuously improving match quality — approximately 44% of recipients have confirmed UEI identifiers from SAM.gov.
Award match coverage is partial
USASpending.gov award matches are available for a portion of enacted projects (approximately 10%). Not every earmark produces a standalone USASpending award record — some are bundled or administered differently.
No competitive grant coverage
FFID covers CPF and CDS earmarks only. Competitive grants, formula funding, RAISE/INFRA/BRIC programs, and other non-earmark federal funding are not included. Use Grants.gov or USASpending for those.
JES matching and amount deviation
Joint Explanatory Statement (JES) data — used to confirm enacted status and link projects to official appropriations records — is sourced from dense conference report PDFs where recipient names and dollar amounts are sometimes abbreviated, combined across subaccounts, or formatted differently than in the originating committee tables. FFID uses fuzzy text matching to link JES entries to projects, and while medium-confidence matches undergo human review, some variance in matched amounts or recipient identity is inherent to the source material. When precision matters, verify directly against the official committee table for that fiscal year.
Army Corps deduplication
A known set of Army Corps projects share similar project keys across different geographic accounts. These records are tracked internally and do not affect user-visible search results, but may affect some aggregations.
Ready to search the data?
97,761+ projects indexed. No account required to start.
Questions about our data?
We are happy to explain our methodology in detail — including coverage gaps, normalization choices, and data sources.
Contact us →