Site Reliability Engineer

ONE

Full-time

Remote

United States

AWS

OnePay is a consumer financial services app with an exceedingly simple mission: to help people achieve financial progress.

Tens of millions of Americans today are unbanked or underbanked, meaning they don’t have enough money in savings to cover a minor emergency. They pay too much in fees, don’t have access to credit at affordable rates, and have little ability to grow their wealth. OnePay’s vision is to create a single app for consumers to save, spend, borrow, and grow their money, bringing our mission to life with simple and accessible banking, credit, and payments products that deliver a best-in-class experience to millions of customers. Our products include:

Checking and high-yield savings accounts
Domestic and international peer-to-peer payments
Credit Builder and credit score monitoring
Digital wallet / contactless payment solutions
Credit card program
Buy-now-pay-later installment loans at Walmart
Prepaid mobile service

Why do we have a right to win? We have the backing of Walmart (a Fortune 1) and Ribbit Capital (a preeminent fintech investor), are deeply embedded with the distribution of the world’s largest omnichannel retailer, and have an industry-leading multi-product value proposition — all in addition to having some of the best people and talent in the industry.

There’s never been a better time to build a category-defining business and there has rarely been a team better positioned for the opportunity. Join us!

The Role
As a Site Reliability Engineer at OnePay, you will play a critical role in ensuring the stability, scalability, and security of the systems that power our financial products, driving reliability practices across infrastructure, platform, and application teams to support millions of customers.

What You’ll Do

Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
Partner closely with product and platform engineering teams to embed reliability best practices in design, development, and deployment processes
Lead root cause analysis and postmortems, driving long-term improvements in resiliency and fault tolerance

You Bring

5+ years of experience as a Software Engineer with a focus on building and running reliable, large-scale, distributed systems in production
5+ years of operational experience in observability tooling and libraries (metrics, logging, tracing) with experience using Datadog or similar tools (Prometheus, Grafana)
Proficiency in at least one programming language (Python, Go, Java, or Node.js preferred) for automation and tooling
Proficiency in incident management, going on-call, and writing post-mortem reports
Excellent collaboration skills with the ability to influence and educate product engineering teams on reliability and observability best practices
Hands-on experience with cloud platforms (AWS preferred), container orchestration (Kubernetes), and IAC tools (Terraform, Pulumi)
Drive and proactivity – everyone here is a builder and executor

Tools we use

We use the fp-ts library for functional programming with Node/TypeScript running on top of Kubernetes and AWS. We believe great engineers can learn any stack, so you do not need experience with these specific tools, but you’ll ramp up more quickly if you are familiar with functional programming concepts

What We Offer

Competitive base salary, stock options, and health benefits from Day 1
401(k) plan with company match
Remote-friendly (US), flexible time off (FTO), and opportunities for growth
A high-growth, mission-driven, inclusive culture where your work has real impact

Standard Interview Process

Initial Interview with Talent Partner
Technical or Hiring Manager Interview
Team Interview
Executive Interview
Offer!

Apply now

Share this job

Twitter Facebook Linkedin Email

Site Reliability Engineer

Standard Interview Process

More jobs

AWS Cloud Engineer- Hybrid Networking

Pantheon Data

AWS Engineer

Point Solutions Group