We are looking for a talented Senior Site Reliability Engineer (SRE) to ensure the reliability, observability, and scalability of a globally distributed payments platform.
We are aĀ financial technology company that provides an open payments platform, enabling and optimizing digital transactions with a comprehensive payment services marketplace.
Job Duties
- Ensure the reliability, availability, and performance of a globally distributed payments platform, processing billions monthly, through monitoring, automation, and continuous improvement.
- Collaborate with development teams to improve the reliability and performance of applications.
- Implement and maintain robust observability solutions, enabling proactive identification, alerting, and resolution of issues.
- Lead incident response efforts, including participation in a shared on-call rotation to maintain 24/7 system reliability, root cause analysis, and implementing preventative measures.
- Develop and maintain automation tools to reduce manual intervention, streamline operations, and enhance developer productivity.
- Monitor, analyze, and optimize the performance of relational databases, identifying and resolving bottlenecks.
- Lead by example, infusing modern SRE best practices and fostering a culture of reliability and performance.
- Provide technical guidance and mentorship to team members.
Ideal Background
- Hands-on experience with Datadog, OpenTelemetry, Sentry, Sumo Logic or similar monitoring and observability platforms.
- Proficiency in a modern programming language, with a proven ability to write clean, maintainable, and efficient code; Ruby, Rails, and Elixir experience are preferred (Python is also accepted)
- Experience with AWS services, including EC2 (Ubuntu Linux), S3, and RDS.
- In-depth knowledge of relational databases (e.g., CockroachDB, PostgreSQL, Riak) with experience in performance optimization and query tuning; experience with Kafka is a plus.
- Experience applying design patterns to enhance reliability, scalability, and performance in application development.
- Excellent problem-solving skills with experience diagnosing complex system issues in production environments.
- Proven ability to work cross-functionally with product, application, infrastructure, and security engineering teams.
- Strong written and verbal communication skills, with the ability to explain complex technical concepts.
Why Us
- 100% Remote flexibility (not eligible for candidates located in California or New York)
- Competitive salary of $130,000 - $170,000 base + equity.
- Outstanding medical and dental benefits, including 100% employer-paid healthcare for the whole family.
- Company-paid life and disability insurance.
- Optional vision and supplemental insurance options, and various Flexible Spending Accounts (FSA).
- Open Paid Time Off policy and 12 weeks of paid leave for new parents.
- Matching 401(k) plan (5% up to $5,000 yearly).
- $1,000 annual professional development stipend.
- Monthly home working/digital lifestyle stipend, new MacBook, and one-time accessory reimbursement.
- LinkedIn Learning subscription.
- Access to company-paid professional coaching service.
- Opportunities for remote employees to visit HQ in Durham, North Carolina.
ā