Migrating a 15-Year-Old Supply Chain Database to the Cloud — Logistics Case Study

Context

The Problem

A regional freight operator was running its core supply chain operations on a legacy on-premises database that had been in continuous operation for 15 years. The system contained over 80 million transactional records and was the source of truth for shipment tracking, inventory management, and partner billing.

The legacy system's annual operating cost had reached $180k, and the pool of engineers capable of maintaining its undocumented stored procedures was shrinking rapidly. Business teams were waiting up to 72 hours for analytics queries that cloud systems could answer in under a minute.

Key Constraints

System processes 4,000+ transactions/hour — zero downtime acceptable during migration

Historical data integrity must be 100% verifiable with cryptographic checksums

Downstream systems depend on the legacy data format

Migration cannot disrupt active shipments for any of their 200+ clients

Architecture

The Solution

The architecture utilized a dual-write migration pattern — during the transition, every write to the legacy system was simultaneously replicated to BigQuery via a custom Go-based CDC (Change Data Capture) pipeline. This allowed validation of the cloud system in production against the legacy ground truth before cutover.

Historical data was batch-migrated using Apache Spark, with SHA-256 checksums computed at the record level to guarantee integrity. All 80M records were migrated and verified over 8 weeks of parallel import jobs.

Technical Architecture

sync_alt

Dual-Write CDC Pipeline

Go-based CDC agent capturing database journal logs and streaming to BigQuery. < 200ms lag during steady state.

local_fire_department

Spark Historical Migration

Apache Spark for bulk migration. SHA-256 checksums on every record; automated reconciliation reports.

hub

Downstream Adapter Layer

Adapter translating legacy message formats to modern REST APIs — transparent to downstream systems.

analytics

BigQuery Data Warehouse

Columnar storage with dbt transformation models replacing legacy batch jobs. Analytics query times reduced by 60%.

"Previously, end-of-month reconciliation processes took several days to complete. They now run in approximately four minutes following the implementation of a modern cloud architecture."