pracaon.pl

Senior Data Engineer (PySpark, NoSQL)

Remote, Polska
Ogłoszenie zewnętrzne
EPAM

EPAM

Partner
32d
Wynagrodzenie do ustalenia
IT i Telekomunikacja
Pełny etat
Zdalna
Wymagania
  • 5+ years of experience as a Data Engineer or similar role

  • Strong hands-on experience with PySpark in production

  • Proven experience in data modeling, partitioning, indexing, and performance tuning in NoSQL systems

  • Strong programming skills in Python

  • Experience building and operating production-grade pipelines in cloud (Azure)

  • Experience with distributed NoSQL databases (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB)

  • Strong understanding of distributed systems and performance optimization

  • Experience with CI/CD, monitoring, troubleshooting, and production support

  • Strong analytical and communication skills (English B2+)

Zakres obowiązków
  • Design and optimize large-scale data pipelines using PySpark

  • Build and maintain scalable ETL/ELT workflows in Azure

  • Troubleshoot production issues related to performance, latency, and availability

  • Work with distributed NoSQL technologies (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB, or similar)

  • Optimize Spark jobs (partitioning, execution plans, resource usage)

  • Implement best practices for scalability, security, and reliability

  • Collaborate with cross-functional teams on data-driven solutions

  • Contribute to automation, CI/CD, and operational improvements

Seniority
  • Senior

Mile widziane
  • Experience with real-time / streaming data

  • Exposure to Data Science workflows

  • Knowledge of Big Data ecosystems

  • Experience with financial data

  • Familiarity with AI-assisted development or LLM tools

Opis

We are seeking a Senior Data Engineer with strong expertise in Azure and PySpark, skilled in designing, implementing, and maintaining robust data processing solutions. This role focuses on building scalable, production-grade data systems, ensuring reliability, and optimizing performance in distributed environments. Responsibilities Design and optimize large-scale data pipelines using PySpark Build and maintain scalable ETL/ELT workflows in Azure Troubleshoot production issues related to performance, latency, and availability Work with distributed NoSQL technologies (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB, or similar) Optimize Spark jobs (partitioning, execution plans, resource usage) Implement best practices for scalability, security, and reliability Collaborate with cross-functional teams on data-driven solutions Contribute to automation, CI/CD, and operational improvements Requirements 5+ years of experience as a Data Engineer or similar role Strong hands-on experience with PySpark in production Proven experience in data modeling, partitioning, indexing, and performance tuning in NoSQL systems Strong programming skills in Python Experience building and operating production-grade pipelines in cloud (Azure) Experience with distributed NoSQL databases (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB) Strong understanding of distributed systems and performance optimization Experience with CI/CD, monitoring, troubleshooting, and production support Strong analytical and communication skills (English B2+) Nice to have Experience with real-time / streaming data Exposure to Data Science workflows Knowledge of Big Data ecosystems Experience with financial data Familiarity with AI-assisted development or LLM tools

Słowa kluczowe / Umiejętności
Data Software Engineering
Azure Cosmos DB
Azure SQL
Microsoft Azure
PySpark
Microsoft Fabric
Oferta została zaimportowana z zewnętrznego portalu.Źródło ogłoszenia