Software Engineer - AI/ML Systems
Emerald
Location
Bay Area, Boston, Washington D.C.
Employment Type
Full time
Location Type
On-site
Department
Engineering
About Emerald
We’re at a pivotal moment in AI and energy: compute demand is surging, but power constraints threaten to stall innovation. Emerald AI operates at the nexus of AI and energy, pioneering solutions that let AI factories scale without overwhelming the grid. By making data centers flexible through the Emerald Conductor software platform, we can unlock immense AI growth with limited capital costs, all while stabilizing the grid & enabling more renewables.
Our team of AI, cloud, software, and energy experts is on a mission to unlock AI’s potential sustainably, backed by premier investors and industry leaders like Radical Ventures and NVIDIA. Read more about our team, story, and backers at https://www.emeraldai.co/.
About the Role
We are looking for a Software Engineer with a strong background in AI/ML systems to design and build intelligent orchestration models that drive automated decision-making across compute infrastructure. You will work at the intersection of systems engineering and applied machine learning, developing models and pipelines that optimize workload placement, resource scheduling, and power-performance tradeoffs in large-scale datacenter environments.
Key Responsibilities
Design and implement approaches for orchestration of AI/ML workloads considering factors such as resource allocation, load balancing, data management and performance.
Contribute to the design and implementation of performance monitoring of various optimization strategies.
Plan and conduct experiments with training and inference workloads using state-of-the-art AI models, such as LLMs.
Contribute to the design and development of mechanisms for power optimization and control with your deep knowledge of AI systems design.
-
Develop models for forecasting various parameters of data center compute jobs and demand.
Minimum Requirements
Bachelors / Masters in Computer Science, Computer Engineering or Electrical Engineering with 5+ years of experience or a PhD degree in relevant field with 2+ years of industry experience.
3+ years of experience in systems engineering or backend infrastructure
Strong programming skills in more than one language (Python, Go, Rust, C++, etc.) with a focus on high-performance and reliable systems
Experience with ML workflows and tools (e.g., PyTorch, scikit-learn), particularly for modeling system behavior
Deep understanding of distributed systems, resource scheduling, and telemetry instrumentation
-
Familiarity with platforms such as Kubernetes, Slurm, Ray, or similar schedulers
Preferred Requirements
Experience applying ML to systems problems (e.g., load prediction, anomaly detection, reinforcement learning for scheduling)
Knowledge of workload characteristics in AI/ML pipelines (training, inference, batch vs. real-time)
Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) and data infrastructure (Kafka, Parquet, etc.)
-
Contributions to open-source infrastructure projects
What We Offer
A chance to join a team of industry leaders and experts working at the nexus of two pivotal industries in a collaborative and collegial environment.
Building from zero to one: help the team build from the ground up. In addition to this role, you have the opportunity to contribute to strategy development, GTM planning, org design, and customer and investor interfacing.
Comprehensive benefits package including insurance for medical, dental, and vision, in addition to 401(k) matching.
Location flexibility between our three hubs in D.C., Boston and the San Francisco Bay Area with 1 WFH day/week.