M

Director of Site Reliability Engineering

Meridian Cooperative
Full-time
On-site
Atlanta, Georgia, United States

Job Details

Atlanta, GA
Full Time

Description

Meridian Cooperative is looking for a Director of Site Reliability Engineering to join a team of passionate innovators and problem-solvers, empowered to rise above challenges and swarm around solutions.  Here, at our Dunwoody office, we are energized by the fact that our work is important.   We are driven to make work as easy as possible for our Members, Customers, Partners, and Employees.  Help us lead the way in Utility Software, join a winning company and thrive. In office presence is required. The role is hybrid and will be performed out of Dunwoody, GA.

Job Summary: 

We are seeking an experienced Director of Site Reliability Engineering (SRE) to join our team. As a Director of SRE, you will manage the SRE team ensuring the reliability, performance, and scalability of our systems and services. The SRE team will collaborate closely with the DevOps and Development teams to improve system stability, automate repetitive tasks, and proactively address issues before they affect customers.

Essential Functions:

  • Directs SRE department operations, staffing, management development, and training to promote interdepartmental collaboration, engagement, and achievement of annual business objectives.
  • Oversee SRE team members responsible for ensuring the reliability, performance, and scalability of our systems and services.
  • Develop comprehensive goals and expectations for maintaining high standards of performance across the SRE team. Make effective and efficient use of resources and set high, achievable aspirations for SRE personnel to align with the organization’s goals and objectives.
  • SREs specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with varied interests in algorithms and distributed systems.
  • Develop and implement monitoring, alerting, and incident management processes to ensure system health and performance.
  • Collaborate with development and operations teams to improve the overall availability and performance of our services.
  • Identify and resolve performance bottlenecks, system failures, and issues related to scalability.
  • Automate repetitive tasks and processes, such as deployments, scaling, and monitoring, to improve efficiency and reduce human error.
  • Develop and maintain tools for continuous integration, automated testing, and continuous deployment.
  • Conduct root cause analysis of incidents and implement long-term solutions to prevent recurrence.
  • Maintain and enhance disaster recovery, backup, and failover strategies to ensure high availability and data integrity.
  • Stay up-to-date with the latest technologies and trends in DevOps, cloud computing, and system reliability.
  • Take the initiative in thought leadership, innovation, and creativity.
  • Represent the company at conferences and networking events.
  • Adheres to all Meridian corporate policies and procedures.
  • Travel as required.
  • Any additional responsibilities assigned by management.

Requirements:

  • Bachelors Degree.
  • Seven years of applicable experience and a minimum of three years in a leadership role.
  • AWS Certified Cloud Practitioner Certification
  • AWS Certified Solutions Architect Certification

Skills:

  • Strong knowledge of cloud platforms (e.g., AWS, Google Cloud, Azure) and experience with cloud infrastructure management.
  • Proficiency in programming or scripting languages (e.g., Python, Go, Bash) for automation and system management.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack).
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Experience with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI).
  • Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
  • Proactive approach to identifying problems, performance bottlenecks, and areas for improvement
  • Infrastructure-as-Code such as Terraform and AWS Cloud Formation Template
  • Strong problem-solving skills and the ability to troubleshoot complex systems in real-time.
  • Excellent communication and collaboration skills, with the ability to work cross-functionally with development, operations, and security teams.

We Offer

What We Offer:

  • Outstanding Medical/Dental/Vision
  • Education/Training Reimbursement
  • On-Site Education Courses
  • Flexible Spending Account
  • Health/Wellness Reimbursement
  • Excellent Life and AD&D insurance
  • Paid Time Off:  Eligible to begin accrual from date of hire; accrual amount based on years of service. Beginning accrual rate equivalent to 22 days per year. 9 holidays which include the day after Thanksgiving, and Christmas Eve. Up to 240 hours of PTO can roll over to the following year.
  • Volunteer Time: 8 hours per year
  • Retirement: Robust 401K. Following one year of eligible service, the Company contributes in two ways: (1) match of 100% of each dollar you contribute on the first five percent (5%) of eligible compensation, and (2) Employer basic contribution of 4% of base salary (with increases in basic contribution percentage based on years of service). Employees are 100% vested in Company funded contributions from the date they enter the plan.

In addition to a competitive salary, a medical/dental/vision plan, and matching 401(k), also offer:

  • Relaxed Dress Code
  • Flexible Hybrid Work Schedules
  • In-Office Gym

About Us:

We were formed in 1976 by a group of Electric Membership Cooperatives with a vision for a single enterprise solution provider to serve data processing, IT, and operational needs to cooperatives, public utility districts, and municipal utilities. Through carefully curated acquisitions and partnerships, Meridian has unified multiple leading-edge companies under its umbrella to truly execute that vision. Today, the Meridian Suite serves over 500 utilities across the country with industry-leading enterprise software solutions.