It's fun to work in a company where people truly believe in what they are doing. At Dutch Bros Coffee, we are more than just a coffee company. We are a fun-loving, mind-blowing company that makes a difference one cup at a time.
Position Overview:
As a Senior Site Reliability Engineer you are a technical leader who will combine software engineering principles with systems operations chops to design, build, and maintain infrastructure automations and develop tools that improve systems reliability, handle incidents, and reduce manual operational interventions across a diverse multi-cloud enterprise. You will blend your software and systems engineering skills and focus on proactive problem-solving, automations, and continuous improvement by defining and then achieving measurable goals. You will collaborate closely with production teams, product managers, IT Ops, DevOps, and fellow developers to support the delivery of large-scale solutions that maintain high uptimes, and deliver excellent user experiences. You will participate in the decision-making related to improving reliability programs across the enterprise. Additionally, you will provide technical guidance and mentorship to junior team members, contributing to the overall growth and success of the platform engineering team.
Job Qualifications:
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience), required
6 or more years of relevant experience; with a strong software engineering background (Python, Java, Golang) and having systems administration expertise
Deep knowledge of operation systems (especially Linux), networking, and system administration tools
Experience with cloud platforms (AWS, Azure) and cloud-native technologies
Proficiency with automation and DevOps tools (Terraform, Ansible, Jenkins, etc.)
Ability to design and implement robust monitoring systems and analyze system metrics
Excellent problem-solving and analytical skills with proven experience in identifying root causes and developing effective solutions
Familiarity with microservices architecture and container orchestration
Experience collecting and analyzing data in order to helping teams draw conclusions and gain insights from their systems
Excellent communication skills with the ability to collaborate effectively between engineers, developers and business stakeholders
Location Requirement:
This role is located in Tempe, Arizona. This position is required to be in office 4 days per week (Mon-Thurs); Fridays are optional remote work days.
Key Result Areas (KRAs):
Develop software and implement processes and tools that provide continuous improvements in systems reliability and availability:
Ensure systems are consistently accessible to users while minimizing downtime
Optimize systems throughput by continually and effectively analyzing and addressing latencies, traffic patterns, errors, and saturation metrics
Anticipate future demand and ensure infrastructure can scale to peak loads effectively
Use automation tools and processes to streamline repetitive tasks in SDLCs and increase operational efficiencies
Quantify acceptable levels of downtime and errors in systems in support of their service level objectives
Align and help drive execution of the Platform Development team’s strategies
Lead the implementation of observability, availability, and monitoring tools and practices:
Implement monitoring systems for tracking key metrics
Configure real-time alerting, system performance analytics, and issue detection
Support incident response efforts, providing metrics and insights, diagnostics, and contribute to the remediations and restoration of services
Conduct thorough post-incident reviews to identify root causes, learn from failures, and implement preventative measures
Other duties as assigned
Skills:
Advanced Experience with Observability Platforms and Practices
Advanced Experience with Automated Testing Tools
Strong Proficiency in Programming Languages
Strong Systems Administration Skills
Strong Analytical and Problem-Solving Skills
Proficiency With DevOps and Automation Tools and Practices
Performance Optimization
Leadership and Mentoring
Physical Requirements:
In-Office Environment: Must be able to work in a busy, crowded, and loud office with frequent distractions and interruptions
Must be able to collaborate in-person with occasional impromptu in-person meetings
Office Conditions: Adaptability to typical office conditions, which may include exposure to air conditioning, heating, artificial lighting, and varying noise levels
Mobility: Ability to sit, stand, reach, twist, stretch, and work at a desk for long stretches. Must be able to occasionally move or lift office items up to 25 pounds
Hearing Requirements: Hearing must be sufficient or correctable to ensure clear understanding of spoken information, including participating in virtual meetings and phone calls. Use of hearing aids or other assistive devices is acceptable if needed.
Reading and Writing Proficiency: Ability to read and write in English is essential for processing documents, drafting reports, and following up on necessary actions. Proficiency in written communication is required to handle job-related tasks effectively.
Vision Requirements: Vision must be adequate or correctable to perform essential job duties, such as reading documents on a computer screen and using other visual tools. Use of corrective lenses or other measures to meet visual requirements is expected if needed.
Technology Proficiency: Must be proficient in operating a computer and other office productivity tools such as printers, scanners, and collaboration software.
Effective Communication: Must possess strong verbal and written communication skills to interact effectively with team members, clients, and other stakeholders via email, video conferencing, and other in office communication tools.
Compensation:
DOE
If you like wild growth and working in a unique and fun environment, surrounded by positive community, you'll enjoy your career with us!