Datalake Developer

Nasdaq Verafin

Engineering High-Volume Pipelines

During my two terms as a Datalake Developer, I was responsible for the ingestion and processing of massive telemetry datasets. My primary focus was optimizing Apache Spark jobs running on AWS Glue to ensure data integrity and reduce processing time.

Technical Highlights

Database Splitting Logic

Implemented complex logic in Scala to split incoming streams into region-specific databases, ensuring compliance with data residency laws.

AWS Step Functions

Orchestrated multi-stage ETL processes using Step Functions, improving error handling and retries for nightly batch jobs.

Stats

Work Terms

Scala

Primary Language

AWS

Infrastructure

Tech Stack

ScalaApache SparkAWS GlueAthenaJavaS3

Impact

Reduced pipeline latency
Enforced Data Integrity
Optimized Storage Costs