e

Site Reliability Engineer -W2

eTek IT Services
Contract
On-site
United States

Overview

The Site Reliability Engineer will play a crucial role in ensuring the reliability, scalability, and performance of our infrastructure and applications, ultimately contributing to the seamless operations of our systems. This role is vital in maintaining a high level of uptime and system efficiency, enhancing the overall user experience, and enabling our organization to meet its objectives.

Key Responsibilities

  • Design and implement monitoring and alerting systems to ensure high availability and performance of services
  • Develop automation tools for system provisioning, configuration management, and application deployment
  • Collaborate with cross-functional teams to ensure that new software and systems are production-ready
  • Perform capacity planning and manage infrastructure capacity efficiently
  • Conduct root cause analysis of production issues and implement preventive measures
  • Participate in on-call rotations and respond to system emergencies
  • Ensure compliance with security and regulatory standards in all aspects of the infrastructure
  • Contribute to the continuous improvement of the reliability and performance of systems and applications
  • Implement best practices for cloud infrastructure and services
  • Lead initiatives to optimize system performance and stability
  • Conduct periodic testing of disaster recovery and failover systems
  • Document system configurations, processes, and procedures
  • Assist in evaluating new technologies and methods to improve reliability and performance

Required Qualifications

  • Bachelor's degree in Computer Science, Information Technology, or a related field
  • 3+ years of experience in a site reliability engineering role
  • Proficiency in Linux system administration and troubleshooting
  • Strong programming skills in Python, Shell scripting, or other scripting languages
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Expertise in building and maintaining scalable, high-performance systems
  • Knowledge of containerization and orchestration technologies (Docker, Kubernetes)
  • Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK)
  • Ability to design and implement automated solutions for infrastructure and application deployment
  • Excellent troubleshooting and problem-solving skills
  • Understanding of networking concepts and protocols
  • Strong communication and collaboration skills
  • Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer) a plus
}