Data Methodology

Name: US Domestic Manufacturer Database
Creator: BOMForge, Inc.
License: Proprietary

How this demonstration platform collects and presents domestic manufacturer data. This is a BOMForge™ technology demo.

Data Sources

SAM.gov Entity Management API: UEI, CAGE codes, registration status, NAICS classifications for all registered government contractors
State manufacturing directories: Official economic development agency databases across all 50 states
Public company filings and registrations: Secretary of State records, business licenses
Industry association membership directories: NTMA, PMA, AMT, SME, and sector-specific organizations
Federal procurement history: Historical contract awards from FPDS-NG linked to supplier profiles

SAM.gov entity sync: daily automated pull via SAM Entity Management API
Web enrichment pipeline: weekly crawl and NLP extraction across manufacturer websites
Full dataset re-index: monthly comprehensive refresh with embedding regeneration
Real-time corrections: user-reported data issues triaged within 24 hours
Capability taxonomy updates: quarterly review aligned with NAICS revision cycles

Multi-source cross-referencing: capabilities verified across two or more independent sources
NAICS code validation: automated classification verified against stated capabilities and SIC crosswalk
Geographic verification: address data validated against USPS databases and geocoded for spatial queries
Certification validation: compliance claims cross-referenced with accreditation body databases where available
Confidence scoring: each data point assigned a confidence level based on source count and recency

Hybrid search: 70% semantic vector similarity (1536-dimensional embeddings) combined with 30% full-text relevance scoring
Fallback chain: vector search, then full-text search, then name match, then capability-based search ensures results for every query
NAICS auto-classification: NLP pipeline assigns NAICS codes from unstructured company descriptions with hierarchical matching
Capability extraction: pattern-based NLP identifies 50+ standardized manufacturing capability categories from free text
Natural language queries: user intent parsed and translated to structured filters across capabilities, certifications, materials, and geography

Source attribution: every data point traceable to its original source with timestamp
Audit trail: all search queries and results logged for Inspector General oversight capability
Evidence bundles: exportable packages containing search parameters, results, source documentation, and methodology for waiver support
Data provenance: full lineage from raw source through enrichment pipeline to final indexed record
Version history: changes to manufacturer profiles tracked with before/after snapshots

Continuously growing dataset of domestic manufacturers indexed with full capability profiles
Infrastructure: PostgreSQL with pgvector extension on managed cloud infrastructure, auto-scaling to handle agency-wide query volumes
API capacity: sub-second response times at sustained 1,000+ queries per minute
Modular architecture: data ingestion, enrichment, search, and reporting as independent services that can be scaled independently
Continuous improvement: ML feedback loop improves search relevance and data quality over time based on usage patterns