Senior ML Infrastructure Engineer
IPTS
Specialism: Infrastructure
Project: A company that specializes in building a distributed LLM inference network utilizing GPU capacity from across the globe into a unified compute plane capable of running large-scale language models.
Key Skills: TypeScript, Python, Go, Rust, C++, Kubernetes, Nomad, ChatGPT, Claude, Cursor, vLLM, TensorRT-LLM, GPU Programming and Optimization, CUDA
Location: San Francisco, CA
Role Detail: We are seeking a Senior ML Infrastructure Engineer to help develop large-scale, fault-tolerant systems that handle millions of LLM inference requests each day. This role focuses on designing and building the core systems behind our globally distributed inference network, tackling challenges that span distributed systems, machine learning, and resource management. Primary responsibilities include designing scalable distributed systems, developing models for efficient resource allocation, optimizing network performance and reliability, building comprehensive monitoring solutions, and collaborating with team members for infrastructure and product advancement.
Requirements:
- Proven ability to solve complex problems and thrive in a startup setting.
- More than 5 years of experience developing high-performance systems.
- Proficiency in programming languages, including Typescript, Python, and one of Go, Rust, or C++.
- Strong understanding of distributed systems principles.
- Familiarity with orchestrators and scheduling tools such as Kubernetes and Nomad.
- Experience incorporating AI tools into the development workflow (e.g., ChatGPT, Claude, Cursor).
- Background in LLM inference engines (e.g., vLLM, TensorRT-LLM) is an advantage.
- Experience with GPU programming and optimization; CUDA expertise is a plus.