Our client is looking for a data egineer to join the team. In their department they are delivering the data warehouse and big data solutions (based on a Cloudera stack) for internal and external customers.
More than 10 data engineering teams are working on these platforms and an important aspect is to ensure these teams have a common way for scheduling and orchestration of ETL jobs. Therefore we have developed a custom application that offers metadata driven scheduling, orchestration and monitoring capabilities. You will be part of the team that is adding features to this tool.
Next to this you will also be involved in other data engineering tasks on the big data platform.
You have at least 2 years of working experience in developing data ingestion, data processing and analytical pipelines for big data, relational databases and data warehouse solutions and deploying ETL solutions on premises and/or cloud
Deep knowledge of coding in Python, PySpark or any other additional programming language as Java and Linux OS (e.g. bash)
Deep knowledge of working with various kinds of file formats like CSV, XML, JSON, Apache Parquet etc.,
Deep knowledge of SQL
Knowledge of version control tools like GitLab and CI/CD processes
Have a degree in IT/Engineering/Sciences or a strong relevant technical experience
An added value is if you have experience with Cloudera technology stack (HDFS, Hive, Spark, Atlas, Ranger, Kafka, Nifi, …), preferable both in batch and real-time use cases
You are fluent in English, both spoken and written. Knowledge of Dutch and/or French is an asset.