Senior Data Engineer (PySpark, NoSQL)
Remote, PolskaEPAM
Wynagrodzenie do ustalenia
Wymagania
5+ years of experience as a Data Engineer or similar role
Strong hands-on experience with PySpark in production
Proven experience in data modeling, partitioning, indexing, and performance tuning in NoSQL systems
Strong programming skills in Python
Experience building and operating production-grade pipelines in cloud (Azure)
Experience with distributed NoSQL databases (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB)
Strong understanding of distributed systems and performance optimization
Experience with CI/CD, monitoring, troubleshooting, and production support
Strong analytical and communication skills (English B2+)
Zakres obowiązków
Design and optimize large-scale data pipelines using PySpark
Build and maintain scalable ETL/ELT workflows in Azure
Troubleshoot production issues related to performance, latency, and availability
Work with distributed NoSQL technologies (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB, or similar)
Optimize Spark jobs (partitioning, execution plans, resource usage)
Implement best practices for scalability, security, and reliability
Collaborate with cross-functional teams on data-driven solutions
Contribute to automation, CI/CD, and operational improvements
Seniority
Senior
Mile widziane
Experience with real-time / streaming data
Exposure to Data Science workflows
Knowledge of Big Data ecosystems
Experience with financial data
Familiarity with AI-assisted development or LLM tools
Opis
We are seeking a Senior Data Engineer with strong expertise in Azure and PySpark, skilled in designing, implementing, and maintaining robust data processing solutions. This role focuses on building scalable, production-grade data systems, ensuring reliability, and optimizing performance in distributed environments. Responsibilities Design and optimize large-scale data pipelines using PySpark Build and maintain scalable ETL/ELT workflows in Azure Troubleshoot production issues related to performance, latency, and availability Work with distributed NoSQL technologies (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB, or similar) Optimize Spark jobs (partitioning, execution plans, resource usage) Implement best practices for scalability, security, and reliability Collaborate with cross-functional teams on data-driven solutions Contribute to automation, CI/CD, and operational improvements Requirements 5+ years of experience as a Data Engineer or similar role Strong hands-on experience with PySpark in production Proven experience in data modeling, partitioning, indexing, and performance tuning in NoSQL systems Strong programming skills in Python Experience building and operating production-grade pipelines in cloud (Azure) Experience with distributed NoSQL databases (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB) Strong understanding of distributed systems and performance optimization Experience with CI/CD, monitoring, troubleshooting, and production support Strong analytical and communication skills (English B2+) Nice to have Experience with real-time / streaming data Exposure to Data Science workflows Knowledge of Big Data ecosystems Experience with financial data Familiarity with AI-assisted development or LLM tools