Datalake
Back to Work

Datalake Developer

Nasdaq Verafin

Nasdaq Logo

Engineering High-Volume Pipelines

During my two terms as a Datalake Developer, I was responsible for the ingestion and processing of massive telemetry datasets. My primary focus was optimizing Apache Spark jobs running on AWS Glue to ensure data integrity and reduce processing time.

Technical Highlights

Database Splitting Logic

Implemented complex logic in Scala to split incoming streams into region-specific databases, ensuring compliance with data residency laws.

AWS Step Functions

Orchestrated multi-stage ETL processes using Step Functions, improving error handling and retries for nightly batch jobs.

Stats

2
Work Terms
Scala
Primary Language
AWS
Infrastructure

Tech Stack

ScalaApache SparkAWS GlueAthenaJavaS3

Impact

  • Reduced pipeline latency
  • Enforced Data Integrity
  • Optimized Storage Costs