Description

Summary:

Meta is looking for software engineers to help scale and improve the efficiency of large AI/ML work loads. A part of this is enabling high performance interconnect (HPI) solutions, optimising collective operations to improve machine learning model performance.This is an opportunity to work within a highly skilled team, collaborating with a large set of cross-functional partners and help bringing next generation large cluster architectures to life.

Required Skills:

Software Engineer – Systems Generalist Responsibilities:

Support networking and compute hardware acceleration techniques to improve ML inference and training model performance
Implement ML model optimisation features
Debug custom and third party multi-host, accelerator enabled AI platforms
SW development using C++/C and Python
Work closely with other teams to deliver impact
develop & improve features and innovations
Extend and optimize large scale learning collective operations

Minimum Qualifications:

Minimum Qualifications:

Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
Specialized experience in one or more of the following machine learning/deep learning domains: Hardware accelerators, AI Infrastructure, OR high performance computing,
Experience of ML systems & AI Frameworks (like PyTorch)
Solid experience developing in C++/C
English language proficiency

Preferred Qualifications:

Preferred Qualifications:

GPU architecture experience
Experience with distributed systems at scale
Parallel programming in MPI, OpenMP, Posix threads or similar distributed frameworks or languages
Experience of large scale machine learning clusters

Industry: Internet

Software Engineer – Systems Generalist

Description

Meta

Job Alerts