Recruiting From Scratch logo

Lead Site Reliability Engineer

Recruiting From Scratch
Full-time
Remote
United States
$170,200 - $170,200 USD yearly
Who is Recruiting from Scratch: 
Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. We partner closely with our clients to deeply understand their needs, then connect them with top-tier candidates who are not only highly skilled but also the right fit for the company’s culture and vision. Our mission is simple: place the best people in the right roles to drive long-term success for both clients and candidates.
 
 

Role: Lead Site Reliability Engineer
Location: Remote (U.S. based)
Company Stage of Funding: Growth-stage venture-backed (Series B/C equivalent)
Office Type: Remote-first
Salary: $170K – $200K base

Company Description

Our client is a rapidly growing, mission-driven technology company serving defense, government, and critical infrastructure enterprises. Their secure collaboration platform helps organizations operate in high-stakes environments where resilience, adaptability, and compliance are paramount. The team has been remote-first since inception and prides itself on hiring exceptional engineers globally.

What You Will Do

As the Lead Site Reliability Engineer, you’ll drive the architecture, reliability, and operational excellence of a platform supporting mission-critical organizations. This role is a 70/30 split between technical leadership and hands-on engineering. You will:

  • Define the strategy and roadmap for the SRE function, aligning infrastructure with product and business goals.

  • Lead the design, deployment, and optimization of production-grade, compliant cloud environments.

  • Build observability, monitoring, and alerting frameworks to ensure performance and reliability at scale.

  • Own incident management processes, including on-call rotations, root cause analysis, and reliability improvements.

  • Partner with security and compliance teams to meet federal and industry standards.

  • Champion automation to improve efficiency and scale operations.

  • Oversee cost management and capacity planning for cloud infrastructure.

  • Mentor engineers and foster a culture of collaboration and technical excellence.

Ideal Candidate Background

  • 5+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.

  • Deep expertise in Kubernetes and infrastructure-as-code (Terraform preferred).

  • Strong background with AWS or other major cloud providers.

  • Skilled in designing monitoring, alerting, and performance optimization strategies.

  • Proven troubleshooting and incident management abilities for distributed systems.

  • Proficiency in at least one scripting/programming language for automation.

  • Strong communicator with experience leading cross-functional initiatives.

  • Comfortable working in a distributed, remote-first environment.

Preferred

  • Familiarity with Grafana, Prometheus, and modern observability stacks.

  • Experience designing high-availability and disaster recovery architectures.

  • Exposure to GCP or Azure in addition to AWS.

  • Experience in highly regulated industries (defense, finance, healthcare, or critical infrastructure).

  • Knowledge of compliance frameworks such as FedRAMP, NIST 800-53, or DoD standards.

  • Prior leadership of distributed teams.

  • Cloud or DevOps certifications (e.g., CKA, CKAD, AWS Solutions Architect).

  • Open-source contributions in reliability, DevOps, or infrastructure tooling.

Compensation and Benefits

  • Competitive base salary: $170K – $200K

  • Equity participation

  • Fully remote U.S. role with a remote-first culture

  • Mission-driven work directly supporting organizations in defense, government, and critical infrastructure

  • Growth-stage company with significant funding and strong customer retention