The Problem
A regional freight operator was running its core supply chain operations on a legacy on-premises database that had been in continuous operation for 15 years. The system contained over 80 million transactional records and was the source of truth for shipment tracking, inventory management, and partner billing.
The legacy system's annual operating cost had reached $180k, and the pool of engineers capable of maintaining its undocumented stored procedures was shrinking rapidly. Business teams were waiting up to 72 hours for analytics queries that cloud systems could answer in under a minute.
Key Constraints
System processes 4,000+ transactions/hour — zero downtime acceptable during migration
Historical data integrity must be 100% verifiable with cryptographic checksums
Downstream systems depend on the legacy data format
Migration cannot disrupt active shipments for any of their 200+ clients
The Solution
The architecture utilized a dual-write migration pattern — during the transition, every write to the legacy system was simultaneously replicated to BigQuery via a custom Go-based CDC (Change Data Capture) pipeline. This allowed validation of the cloud system in production against the legacy ground truth before cutover.
Historical data was batch-migrated using Apache Spark, with SHA-256 checksums computed at the record level to guarantee integrity. All 80M records were migrated and verified over 8 weeks of parallel import jobs.
Technical Architecture
Dual-Write CDC Pipeline
Go-based CDC agent capturing database journal logs and streaming to BigQuery. < 200ms lag during steady state.
Spark Historical Migration
Apache Spark for bulk migration. SHA-256 checksums on every record; automated reconciliation reports.
Downstream Adapter Layer
Adapter translating legacy message formats to modern REST APIs — transparent to downstream systems.
BigQuery Data Warehouse
Columnar storage with dbt transformation models replacing legacy batch jobs. Analytics query times reduced by 60%.
"Previously, end-of-month reconciliation processes took several days to complete. They now run in approximately four minutes following the implementation of a modern cloud architecture."

