Description

They are responsible for optimisation of ETL pipelines, maintaining all Spark jobs.

Building a data lake.

Carrying out efficient integration with our data providers via various API endpoints and data representation formats.

Building and deploying an in-house distributed ETL pipeline for processing petabytes of data per day

Key Responsibilities
  • Setting up monitoring for key performance metrics and overall systems’ behaviour to promptly react in case any anomaly detected
  • Providing continuous improvements in the way data is being processed and stored based on the feedback and needs of the business or other teams
  • Building and deploying an in-house distributed ETL pipeline for processing petabytes of data per day