Hyperdrive Recruiting logo

Senior Site Reliability Engineer

Hyperdrive Recruiting
Full-time
Remote
United States
$130,000 - $170,000 USD yearly

We are looking for a talented Senior Site Reliability Engineer (SRE) to ensure the reliability, observability, and scalability of a globally distributed payments platform.

We are aĀ financial technology company that provides an open payments platform, enabling and optimizing digital transactions with a comprehensive payment services marketplace.

Job Duties

  • Ensure the reliability, availability, and performance of a globally distributed payments platform, processing billions monthly, through monitoring, automation, and continuous improvement.
  • Collaborate with development teams to improve the reliability and performance of applications.
  • Implement and maintain robust observability solutions, enabling proactive identification, alerting, and resolution of issues.
  • Lead incident response efforts, including participation in a shared on-call rotation to maintain 24/7 system reliability, root cause analysis, and implementing preventative measures.
  • Develop and maintain automation tools to reduce manual intervention, streamline operations, and enhance developer productivity.
  • Monitor, analyze, and optimize the performance of relational databases, identifying and resolving bottlenecks.
  • Lead by example, infusing modern SRE best practices and fostering a culture of reliability and performance.
  • Provide technical guidance and mentorship to team members.

Ideal Background

  • Hands-on experience with Datadog, OpenTelemetry, Sentry, Sumo Logic or similar monitoring and observability platforms.
  • Proficiency in a modern programming language, with a proven ability to write clean, maintainable, and efficient code; Ruby, Rails, and Elixir experience are preferred (Python is also accepted)
  • Experience with AWS services, including EC2 (Ubuntu Linux), S3, and RDS.
  • In-depth knowledge of relational databases (e.g., CockroachDB, PostgreSQL, Riak) with experience in performance optimization and query tuning; experience with Kafka is a plus.
  • Experience applying design patterns to enhance reliability, scalability, and performance in application development.
  • Excellent problem-solving skills with experience diagnosing complex system issues in production environments.
  • Proven ability to work cross-functionally with product, application, infrastructure, and security engineering teams.
  • Strong written and verbal communication skills, with the ability to explain complex technical concepts.

Why Us

  • 100% Remote flexibility (not eligible for candidates located in California or New York)
  • Competitive salary of $130,000 - $170,000 base + equity.
  • Outstanding medical and dental benefits, including 100% employer-paid healthcare for the whole family.
  • Company-paid life and disability insurance.
  • Optional vision and supplemental insurance options, and various Flexible Spending Accounts (FSA).
  • Open Paid Time Off policy and 12 weeks of paid leave for new parents.
  • Matching 401(k) plan (5% up to $5,000 yearly).
  • $1,000 annual professional development stipend.
  • Monthly home working/digital lifestyle stipend, new MacBook, and one-time accessory reimbursement.
  • LinkedIn Learning subscription.
  • Access to company-paid professional coaching service.
  • Opportunities for remote employees to visit HQ in Durham, North Carolina.
ā€Š