Freelance PySpark Expert β Data Warehouse Deployment
π Location: Luxembourg (4 days on-site, 1 day remote per week)
π Contract Type: Freelance
β³ Duration: 80 days (Time & Material)
π£ Languages: English (required); French/German/Luxembourgish (a plus)
About the Role
We are seeking a highly skilled PySpark expert to support the deployment of a high-performance real-time data warehouse. As an external consultant, you will play a critical role in optimizing real-time data ingestion and processing while ensuring a smooth production rollout.
Key Focus Areas:
- Apache PySpark for large-scale data transformation
- Apache Kafka for real-time streaming data
- Debezium for Change Data Capture (CDC) from PostgreSQL
This role requires hands-on expertise in PySpark, Kafka, and Debezium, as well as a collaborative mindset to help our internal team upskill while delivering a robust data infrastructure.
Key Responsibilities
β
Design and implement efficient Spark-based data processing workflows
β
Optimize real-time data ingestion pipelines for performance and reliability
β
Apply best practices for scalability and maintainability of the data warehouse
β
Fine-tune Kafka and Debezium for seamless CDC log processing
β
Train and support the internal team for a smooth post-deployment transition
Required Skills & Qualifications
β Expert-level proficiency in PySpark (performance optimization, structured job management)
β Strong experience in real-time data processing and transformation
β Knowledge of data engineering best practices and data warehouse modeling
β Experience with Kafka and Debezium for CDC log management
β Proficiency in PostgreSQL and TimescaleDB (a plus)
β Understanding of real-time data architectures and CI/CD pipelines
β Fluency in English (spoken & written); French, German, or Luxembourgish is a plus
Work Schedule & Contract Details
- On-site: 4 days per week
- Remote: 1 day per week
- Contract Type: Freelance (Time & Material)
- Duration: 80 days
Ready to bring your expertise and make an impact? Apply now!