Solution Architect (Kernel Optimization & ML Performance) | Associate Management
Kraków, Lesser Poland Voivodeship, PolskaKey offer highlights
Hybrid model - partly remote
12+ years of experience
Architect role
Description
Join us as a Solution Architect (Kernel Optimization & ML Performance) and play a key role in shaping impactful, scalable solutions that drive real business value. You will work at the intersection of advanced hardware acceleration and machine learning infrastructure, translating complex performance and optimization requirements into robust architectural designs. This role offers the opportunity to collaborate with global, cross-functional teams in a dynamic and fast-paced environment, working closely with ML researchers, compiler engineers, and systems architects. You’ll have a direct influence on the technical direction of high-performance ML solutions, solution quality, and overall efficiency across large-scale AI workloads. At EPAM, we value innovation, ownership, and a proactive mindset, giving you the space to make a tangible technological impact. If you are ready to take on a strategic architect-level role in AI infrastructure and grow your career in a global setting, we encourage you to apply. Responsibilities Define and own the architecture for performance-critical ML workloads, leveraging custom kernels on TPUs and GPUs Create a strategic roadmap for kernel optimization, framework integration, and large-scale performance improvement Collaborate with the client’s technical leadership, ML researchers, and engineers to capture requirements and design scalable solutions Evaluate and select the right technologies, toolchains, and design patterns to optimize compute-intensive operations Guide the development of benchmarking infrastructure, autotuning frameworks, performance profiling tools, and regression suites Advocate ML performance best practices, influencing framework and compiler enhancements across teams Provide technical leadership and mentorship to development teams implementing architectural solutions Ensure adherence to security, scalability, and maintainability standards in solution design Requirements Bachelor’s or Master’s degree in Computer Science or equivalent practical experience 12+ years of software engineering experience, including 5+ years in architecture or technical leadership roles In-depth knowledge of ML frameworks (JAX, PyTorch, TensorFlow) and Core ML concepts Hands-on expertise in performance optimization at the kernel level targeting TPUs/GPUs Strong experience with C++/Python for high-performance computing Solid grasp of compiler principles, graph optimizations, and toolchains such as MLIR or OpenXLA Proven experience designing scalable, production-grade ML systems with emphasis on performance and efficiency Excellent communication, stakeholder management, and solution delivery skills Nice to have Familiarity with emerging hardware accelerators, heterogeneous compute, and scale-out performance patterns Experience with autotuning systems, benchmarking methodologies, and performance profiling tools Contributions to open-source projects in ML performance, kernels, or developer infrastructure Demonstrated ability to present architectural solutions to technical and non-technical audiences
Requirements
Bachelor’s or Master’s degree in Computer Science or equivalent practical experience
12+ years of software engineering experience, including 5+ years in architecture or technical leadership roles
In-depth knowledge of ML frameworks (JAX, PyTorch, TensorFlow) and Core ML concepts
Hands-on expertise in performance optimization at the kernel level targeting TPUs/GPUs
Strong experience with C++/Python for high-performance computing
Solid grasp of compiler principles, graph optimizations, and toolchains such as MLIR or OpenXLA
Proven experience designing scalable, production-grade ML systems with emphasis on performance and efficiency
Excellent communication, stakeholder management, and solution delivery skills
Responsibilities
Define and own the architecture for performance-critical ML workloads, leveraging custom kernels on TPUs and GPUs
Create a strategic roadmap for kernel optimization, framework integration, and large-scale performance improvement
Collaborate with the client’s technical leadership, ML researchers, and engineers to capture requirements and design scalable solutions
Evaluate and select the right technologies, toolchains, and design patterns to optimize compute-intensive operations
Guide the development of benchmarking infrastructure, autotuning frameworks, performance profiling tools, and regression suites
Advocate ML performance best practices, influencing framework and compiler enhancements across teams
Provide technical leadership and mentorship to development teams implementing architectural solutions
Ensure adherence to security, scalability, and maintainability standards in solution design
Seniority
Associate Management
Nice to have
Familiarity with emerging hardware accelerators, heterogeneous compute, and scale-out performance patterns
Experience with autotuning systems, benchmarking methodologies, and performance profiling tools
Contributions to open-source projects in ML performance, kernels, or developer infrastructure
Demonstrated ability to present architectural solutions to technical and non-technical audiences
Keywords / Skills