OnePay is a consumer financial services app with an exceedingly simple mission: to help people achieve financial progress.
Tens of millions of Americans today are unbanked or underbanked, meaning they don’t have enough money in savings to cover a minor emergency. They pay too much in fees, don’t have access to credit at affordable rates, and have little ability to grow their wealth. OnePay’s vision is to create a single app for consumers to save, spend, borrow, and grow their money, bringing our mission to life with simple and accessible banking, credit, and payments products that deliver a best-in-class experience to millions of customers. Our products include:
- Checking and high-yield savings accounts
- Domestic and international peer-to-peer payments
- Credit Builder and credit score monitoring
- Digital wallet / contactless payment solutions
- Credit card program
- Buy-now-pay-later installment loans at Walmart
- Prepaid mobile service
Why do we have a right to win? We have the backing of Walmart (a Fortune 1) and Ribbit Capital (a preeminent fintech investor), are deeply embedded with the distribution of the world’s largest omnichannel retailer, and have an industry-leading multi-product value proposition — all in addition to having some of the best people and talent in the industry.
There’s never been a better time to build a category-defining business and there has rarely been a team better positioned for the opportunity. Join us!
The Role
As a Site Reliability Engineer at OnePay, you will play a critical role in ensuring the stability, scalability, and security of the systems that power our financial products, driving reliability practices across infrastructure, platform, and application teams to support millions of customers.
What You’ll Do
- Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
- Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
- Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
- Partner closely with product and platform engineering teams to embed reliability best practices in design, development, and deployment processes
- Lead root cause analysis and postmortems, driving long-term improvements in resiliency and fault tolerance
You Bring
- 5+ years of experience as a Software Engineer with a focus on building and running reliable, large-scale, distributed systems in production
- 5+ years of operational experience in observability tooling and libraries (metrics, logging, tracing) with experience using Datadog or similar tools (Prometheus, Grafana)
- Proficiency in at least one programming language (Python, Go, Java, or Node.js preferred) for automation and tooling
- Proficiency in incident management, going on-call, and writing post-mortem reports
- Excellent collaboration skills with the ability to influence and educate product engineering teams on reliability and observability best practices
- Hands-on experience with cloud platforms (AWS preferred), container orchestration (Kubernetes), and IAC tools (Terraform, Pulumi)
- Drive and proactivity – everyone here is a builder and executor
Tools we use
We use the fp-ts library for functional programming with Node/TypeScript running on top of Kubernetes and AWS. We believe great engineers can learn any stack, so you do not need experience with these specific tools, but you’ll ramp up more quickly if you are familiar with functional programming concepts
What We Offer
- Competitive base salary, stock options, and health benefits from Day 1
- 401(k) plan with company match
- Remote-friendly (US), flexible time off (FTO), and opportunities for growth
- A high-growth, mission-driven, inclusive culture where your work has real impact
Standard Interview Process
- Initial Interview with Talent Partner
- Technical or Hiring Manager Interview
- Team Interview
- Executive Interview
- Offer!