GPU Software Engineer (HPC / Deep Learning Optimization)
Remote Poland, PolskaWichtige Merkmale des Angebots
DevOps / Cloud: AWS, Azure, Docker, Kubernetes
Auf der Suche nach Experten – Senior/Experte
Englisch B2/C1
Vollzeit
Remote-Arbeit - kein Pendeln
Description
We are looking for a Software Engineer focused on GPU computing, HPC workloads, and Deep Learning inference optimization on Windows platform. The project is aimed at improving performance and efficiency of GPU-based workloads, including compute kernels and inference pipelines. The role is not limited to graphics APIs and is suitable for candidates with strong experience in CUDA, OpenCL, or similar technologies, as well as shader-based optimization.
What we offer
Global Relocation - (Relocation options; Experience in an international environment; Cross-cultural experience)
Recognition and Evaluation - (Feedback culture; Regular appraisals)
Time Off - (Annual holiday - 20 or 26 days. The duration of the leave depends on the overall seniority; Occasional leave - 1 or 2 days/ depending on the circumstances; Child care leave - 2 days or 16 hours per year; Absence due to force majeure - 2 days or 16 hours per year; Maternity Leave - 20 weeks; Parental Leave - 41 weeks; Paternity Leave - 14 days)
Luxoft Training Center - (Expert-led tech courses covering basic to advanced topics; Internal instructor-led soft skills courses; Comprehensive in-house self-learning resources for both soft and hard skills; Access to external self-learning libraries like ProQuest eBook and Udemy for Business; Cloud Programs: MS Cloud Academy, AWS Partner Academy, Google Cloud Academy; Custom Learning Programs: upskilling, reskilling, technical mentorship; Leadership Programs for Managers)
Well-being and Work-life Balance - (Multisport card; Possibility to order Multisport card at the corporate rate for family members; LuxGood Program: wellbeing seminars, contests, relaxation sessions, yoga sessions, etc.; One Team Program: Buddy for each New Joiner; seminars, meeting and workplace space to support integration with local community and culture; “Hire me” workshops for partners; Preferential banking offer; Preferential car leasing offer; Cafeteria program discounts for shops, cinema tickets, holiday offers; Luxoft Social Benefit Fund: sport and recreation benefits, the possibility to receive financial support)
Health Care - (Private Healthcare Insurance with unlimited access to specialists; Full dental support; Travel Insurance; Possibility to add private healthcare coverage for family members at the corporate rate; Life insurance at the corporate rate for employees and family members, including payment of the basic package for the employee by the employer; Reimbursement for corrective glasses)
Company Events and Friendly Environment - (Many fun social activities organized by the Luxoft team offline in your city; Online entertainment events for whole company and local team events; A workplace where you’re treated with respect within a multicultural team)
Internal Mobility - (Rotation between projects and accounts; New career opportunities)
Self-Learning Library
CSR Projects
Other
Languages: English: B2 Upper Intermediate
Seniority: Senior
Requirements
Hands on experience writing and optimizing GPU compute kernels in at least one framework: CUDA, HIP, OpenCL, SYCL, DirectCompute / HLSL compute shaders, Metal compute, or equivalent.
Ability to profile and performance tune GPU code (memory, compute, latency) as part of that work.
Strong knowledge of C++
Experience with HPC or Deep Learning inference optimization
Experience with profiling tools (Nsight, Radeon GPU Profiler, PIX, etc.)
Experience with Deep Learning frameworks (TensorRT, ONNX Runtime, PyTorch, DirectML, etc.)
Understanding of graphics pipelines and rendering basics
Experience with a graphics API (DirectX, Vulkan, Metal, etc.)
Experience with Windows platform GPU development
Experience with CI/CD, version control, or automated testing
Responsibilities
Develop, optimize, and maintain GPU compute kernels using C++ and a GPU programming framework (CUDA, HIP, OpenCL, SYCL, DirectCompute / HLSL compute shaders, Metal compute, or equivalent).
Profile GPU workloads and tune memory, compute, and latency to improve performance and efficiency.
Analyze performance bottlenecks and apply targeted optimizations.
Debug and resolve performance and stability issues.
Apply kernel optimization to HPC or Deep Learning inference pipelines where needed.
Collaborate with engineers, QA, and stakeholders.
Follow coding standards and contribute to technical documentation.
Stichwörter / Fähigkeiten