
Back to Work
Datalake Developer
Nasdaq Verafin

Engineering High-Volume Pipelines
During my two terms as a Datalake Developer, I was responsible for the ingestion and processing of massive telemetry datasets. My primary focus was optimizing Apache Spark jobs running on AWS Glue to ensure data integrity and reduce processing time.
Technical Highlights
Database Splitting Logic
Implemented complex logic in Scala to split incoming streams into region-specific databases, ensuring compliance with data residency laws.
AWS Step Functions
Orchestrated multi-stage ETL processes using Step Functions, improving error handling and retries for nightly batch jobs.
Stats
2
Work Terms
Scala
Primary Language
AWS
Infrastructure
Tech Stack
ScalaApache SparkAWS GlueAthenaJavaS3
Impact
- Reduced pipeline latency
- Enforced Data Integrity
- Optimized Storage Costs