About
Hydra Host is a fast-growing baremetal HPC infrastructure company building a backbone that our customers rely on to provide rock-solid infrastructure to host their cloud systems and perform mission-critical training and inference. Uptime, reliability, and observability are critical to everything we do. We’re hiring a Site Reliability Engineer to own QA systems and processes, monitoring, and backend service delivery, to ensure our systems meet internal and customer SLAs. This person will work with devops, datacenter, device, and marketplace teams to ensure these goals are in sync and systems in place across all domains.
We support mission‑critical tooling for our enterprise customers. Our success depends on delivering exceptional customer support and operational excellence. We’ve integrated several dozen datacenters and we are expanding quickly to disrupt the traditional cloud compute model.
We are a small team who relies on lean and automated processes to manage enterprise-scale infrastructure without the need for enterprise-scale red tape and armies of technicians. This role is ideal for someone who thrives at the intersection of automation engineering, systems reliability, tooling development, and operational excellence.
What You’ll Do
As our Platform Reliability Engineer, you will:
Required Qualifications
Preferred Qualifications
What We Offer