Description
Summary:
Meta is looking for software engineers to help scale and improve the efficiency of large AI/ML work loads. A part of this is enabling high performance interconnect (HPI) solutions, optimising collective operations to improve machine learning model performance.This is an opportunity to work within a highly skilled team, collaborating with a large set of cross-functional partners and help bringing next generation large cluster architectures to life.
Required Skills:
Software Engineer – Systems Generalist Responsibilities:
-
Support networking and compute hardware acceleration techniques to improve ML inference and training model performance
-
Implement ML model optimisation features
-
Debug custom and third party multi-host, accelerator enabled AI platforms
-
SW development using C++/C and Python
-
Work closely with other teams to deliver impact
-
develop & improve features and innovations
-
Extend and optimize large scale learning collective operations
Minimum Qualifications:
Minimum Qualifications:
-
Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
-
Specialized experience in one or more of the following machine learning/deep learning domains: Hardware accelerators, AI Infrastructure, OR high performance computing,
-
Experience of ML systems & AI Frameworks (like PyTorch)
-
Solid experience developing in C++/C
-
English language proficiency
Preferred Qualifications:
Preferred Qualifications:
-
GPU architecture experience
-
Experience with distributed systems at scale
-
Parallel programming in MPI, OpenMP, Posix threads or similar distributed frameworks or languages
-
Experience of large scale machine learning clusters
Industry: Internet