Lead Site Reliability Engineer

Recruiting From Scratch

Full-time

Remote

United States

$170,200 - $170,200 USD yearly

Who is Recruiting from Scratch:

Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. We partner closely with our clients to deeply understand their needs, then connect them with top-tier candidates who are not only highly skilled but also the right fit for the company’s culture and vision. Our mission is simple: place the best people in the right roles to drive long-term success for both clients and candidates.

https://www.recruitingfromscratch.com/

Role: Lead Site Reliability Engineer
Location: Remote (U.S. based)
Company Stage of Funding: Growth-stage venture-backed (Series B/C equivalent)
Office Type: Remote-first
Salary: $170K – $200K base

Company Description

Our client is a rapidly growing, mission-driven technology company serving defense, government, and critical infrastructure enterprises. Their secure collaboration platform helps organizations operate in high-stakes environments where resilience, adaptability, and compliance are paramount. The team has been remote-first since inception and prides itself on hiring exceptional engineers globally.

What You Will Do

As the Lead Site Reliability Engineer, you’ll drive the architecture, reliability, and operational excellence of a platform supporting mission-critical organizations. This role is a 70/30 split between technical leadership and hands-on engineering. You will:

Define the strategy and roadmap for the SRE function, aligning infrastructure with product and business goals.
Lead the design, deployment, and optimization of production-grade, compliant cloud environments.
Build observability, monitoring, and alerting frameworks to ensure performance and reliability at scale.
Own incident management processes, including on-call rotations, root cause analysis, and reliability improvements.
Partner with security and compliance teams to meet federal and industry standards.
Champion automation to improve efficiency and scale operations.
Oversee cost management and capacity planning for cloud infrastructure.
Mentor engineers and foster a culture of collaboration and technical excellence.

Ideal Candidate Background

5+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
Deep expertise in Kubernetes and infrastructure-as-code (Terraform preferred).
Strong background with AWS or other major cloud providers.
Skilled in designing monitoring, alerting, and performance optimization strategies.
Proven troubleshooting and incident management abilities for distributed systems.
Proficiency in at least one scripting/programming language for automation.
Strong communicator with experience leading cross-functional initiatives.
Comfortable working in a distributed, remote-first environment.

Preferred

Familiarity with Grafana, Prometheus, and modern observability stacks.
Experience designing high-availability and disaster recovery architectures.
Exposure to GCP or Azure in addition to AWS.
Experience in highly regulated industries (defense, finance, healthcare, or critical infrastructure).
Knowledge of compliance frameworks such as FedRAMP, NIST 800-53, or DoD standards.
Prior leadership of distributed teams.
Cloud or DevOps certifications (e.g., CKA, CKAD, AWS Solutions Architect).
Open-source contributions in reliability, DevOps, or infrastructure tooling.

Compensation and Benefits

Competitive base salary: $170K – $200K
Equity participation
Fully remote U.S. role with a remote-first culture
Mission-driven work directly supporting organizations in defense, government, and critical infrastructure
Growth-stage company with significant funding and strong customer retention

Apply now

Share this job

Twitter Facebook Linkedin Email

Lead Site Reliability Engineer

Company Description

What You Will Do

Ideal Candidate Background

Preferred

Compensation and Benefits

More jobs

Senior DevOps Engineer (Remote - Texas)

Jobgether

Site Reliability Engineer (SRE) with Frontend and Backend

Jobgether