They are responsible for optimisation of ETL pipelines, maintaining all Spark jobs.
Building a data lake.
Carrying out efficient integration with our data providers via various API endpoints and data representation formats.
Building and deploying an in-house distributed ETL pipeline for processing petabytes of data per day
- Setting up monitoring for key performance metrics and overall systems’ behaviour to promptly react in case any anomaly detected
- Providing continuous improvements in the way data is being processed and stored based on the feedback and needs of the business or other teams
- Building and deploying an in-house distributed ETL pipeline for processing petabytes of data per day