Do you enjoy keeping systems reliable, performant, and scalable while continuing to grow your technical depth? As a Senior Site Reliability Engineer (SRE) / DevOps Engineer at Vantage, youโll contribute to the stability and efficiency of our distributed services. Youโll work closely with the SRE and software engineering teams, helping to build, maintain, and automate infrastructure and processes that support our mission-critical services.
As a Senior Site Reliability Engineer (SRE) at our organization, youโll play a pivotal role in combining software and systems engineering to build, maintain, and enhance our mission-critical services. Youโll be responsible for guaranteeing the reliability and uptime of both internal and external systems, all while driving continuous improvement at a rapid pace.
Minimum Requirements
- 6 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role working with software and infrastructure.
- Proficiency with either Python or Bash.
- Hands-on experience with Azure.
- Familiarity with CI/CD pipelines and infrastructure as code (IaC) and its tooling such as terraform and ansible.
- Demonstrated ability to triage and prioritize effectively when troubleshooting incidents.
- History of engaging effectively with cross-functional teams during events such as incident-response and post-mortems.
- Track-record of proactively tailoring infrastructure to meet the unique needs of the product it supports.
- Bonus: Experience with application deployment, data pipelines, Snowflake and/or relational databases, and knowledge of AWS or GCP is also valuable.
Role Responsibilities
- Collaborate with a diverse team of software engineers, engaging in iterative processes and effective task planning to drive our projects forward.
- Take ownership of the availability, scalability, and performance of our services, to proactively identify issues, and implement automation to prevent the recurrence of problems.
- Participate in the on-call rotation, responding to incidents and working with the team to restore service and prevent recurrence.
- Contribute to automating infrastructure provisioning, configuration, and management using IaC principles with tools like Terragrunt and Ansible.
- Help design and enhance monitoring, logging, and alerting systems to improve observability and ensure system health.
- Participate in blameless post-mortems, documenting issues, and following up on action items to foster a culture of learning and continuous improvement.
- Foster collaboration with other engineering teams, promoting the reuse of existing frameworks and gaining insights into their operation.
- Stay current with industry trends, emerging technologies, and best practices in SRE, DevOps, and automation.
About Vantage
Vantage is the first unified platform purpose-built for retail media orchestration, empowering enterprise retailers to seamlessly activate onsite, offsite, and in-store advertising. With a global presence in North America and Asia-Pacific, Vantage enables retailers to launch and grow their media networks through scalable technology and automated workflows, and is trusted by leading retailers like The Home Depot to power their retail media programs.
For a closer look at what we do, our culture, and our benefits, check out our about us and careers pages.