Skip to main content
All Case Studies
Enterprise / Insurance

Data Platform Modernization: 6-Hour Reports Now Run in 4 Minutes

A mid-size insurance company was running on 10-year-old on-prem SQL Server with manual ETL jobs that broke every week. We migrated them to a lakehouse on AWS, and reports that took 6 hours now finish in 4 minutes.

4 minReport Generation (was 6 hours)

The Challenge

What was getting in the way

  1. 01

    The core data warehouse ran on SQL Server 2014 hosted on-prem. Queries for monthly reports took up to 6 hours

  2. 02

    ETL jobs were built in SSIS and maintained by one person. When that person was on leave, broken jobs just stayed broken

  3. 03

    No one trusted the data. Finance, underwriting, and claims teams each kept their own Excel spreadsheets as the 'real' numbers

The Solution

How we solved it

We migrated the data platform to a lakehouse architecture on AWS. Raw data lands in S3 via CDC connectors from the SQL Server and three other source systems. Spark jobs on EMR Serverless handle transforms, dbt manages the modeling layer, and everything lands in a Redshift Serverless warehouse for analysts. We replaced the SSIS jobs with Airflow DAGs with built-in alerting. The whole pipeline runs on a schedule with data quality checks at every stage. The team went from 'I don't trust these numbers' to self-service dashboards within 10 weeks.

Technologies

AWS S3
Apache Spark
dbt
Redshift
Airflow
CDC
Terraform

What We Built

A look inside the project

Data Pipeline Monitor
All Pipelines Healthy
Last refresh: 12s ago
PipelineStatusLast RunRecordsDuration
Claims ETL
Completed
2 min ago
124,847 rows
4m 12s
Policy Sync
Running
Running now
89,214 rows
2m 03s...
Financial Reports
Completed
14 min ago
312,556 rows
6m 44s
Risk Analytics
Completed
28 min ago
47,103 rows
1m 58s
Data Quality3/3 Passed
Schema validation passed
No null primary keys
Row count within expected range
Performance Comparison
Old System
6 hrs 14 min
New System
3 min 48 sec
98.9% faster

Illustration based on actual project deliverable

The Process

Step-by-step delivery

Step 1

Assessment

Map all data sources, dependencies, and downstream consumers

Step 2

CDC Setup

Connect change data capture from SQL Server and 3 other sources

Step 3

Lakehouse Build

S3 raw layer + Spark transforms + dbt models + Redshift

Step 4

Pipeline Orchestration

Replace SSIS with Airflow DAGs and data quality checks

Step 5

Self-Service Analytics

Deploy dashboards and train the team on self-service queries

The Results

The numbers

4 min

Report Generation (was 6 hours)

70%

Lower Infrastructure Costs

10 weeks

Full Migration Timeline

Built with:AWS S3Apache SparkdbtRedshiftAirflowCDCTerraform