Senior Data Engineer (PySpark)
Remote, PolskaEPAM
Wynagrodzenie do ustalenia
Wymagania
5+ years of experience in Data Software Engineering or related engineering roles
Solid experience with PySpark and SparkSQL
Experience with Cosmos DB (NoSQL API)
Hands-on experience with OneLake or Delta Lake and OpenLake concepts
Knowledge of DF Gen2 and M-code
Experience with CI/CD pipelines using Azure DevOps or equivalent
Good understanding of Azure services
Experience integrating data solutions with Power BI
Experience with Azure Fabric would be an asset
Strong problem-solving and analytical skills
Ability to work independently on complex tasks
Experience working in Agile or Scrum environments
Upper-intermediate proficiency in English (B2+)
Zakres obowiązków
Implement data processing and transformation with Python, PySpark and SparkSQL
Work with OneLake (Delta / OpenLake) for efficient data storage and analytics
Develop and support solutions using Cosmos DB (NoSQL API)
Contribute to Fabric workloads including Data Engineering, Data Factory Gen2 and Lakehouse
Design, develop and maintain scalable data pipelines using Azure Fabric
Implement and maintain CI/CD pipelines and follow DevOps best practices
Integrate data solutions with Power BI for reporting and analytics
Collaborate with AI, data science and product teams to support AI-driven use cases
Ensure data quality, performance, security and reliability
Participate in Agile ceremonies and contribute to sprint delivery
Support production issues and continuous improvements
Seniority
Senior
Opis
We are looking for a Senior Data Engineer with expertise in AI-enabled data platforms and PySpark. This role involves designing, building and optimizing modern data pipelines and analytics solutions, collaborating with architects, lead engineers and business stakeholders to deliver robust, scalable and AI-integrated data solutions. Responsibilities Implement data processing and transformation with Python, PySpark and SparkSQL Work with OneLake (Delta / OpenLake) for efficient data storage and analytics Develop and support solutions using Cosmos DB (NoSQL API) Contribute to Fabric workloads including Data Engineering, Data Factory Gen2 and Lakehouse Design, develop and maintain scalable data pipelines using Azure Fabric Implement and maintain CI/CD pipelines and follow DevOps best practices Integrate data solutions with Power BI for reporting and analytics Collaborate with AI, data science and product teams to support AI-driven use cases Ensure data quality, performance, security and reliability Participate in Agile ceremonies and contribute to sprint delivery Support production issues and continuous improvements Requirements 5+ years of experience in Data Software Engineering or related engineering roles Solid experience with PySpark and SparkSQL Experience with Cosmos DB (NoSQL API) Hands-on experience with OneLake or Delta Lake and OpenLake concepts Knowledge of DF Gen2 and M-code Experience with CI/CD pipelines using Azure DevOps or equivalent Good understanding of Azure services Experience integrating data solutions with Power BI Experience with Azure Fabric would be an asset Strong problem-solving and analytical skills Ability to work independently on complex tasks Experience working in Agile or Scrum environments Upper-intermediate proficiency in English (B2+)