Site Reliability Engineer -W2

eTek IT Services

Contract

On-site

United States

Overview

The Site Reliability Engineer will play a crucial role in ensuring the reliability, scalability, and performance of our infrastructure and applications, ultimately contributing to the seamless operations of our systems. This role is vital in maintaining a high level of uptime and system efficiency, enhancing the overall user experience, and enabling our organization to meet its objectives.

Key Responsibilities

Design and implement monitoring and alerting systems to ensure high availability and performance of services
Develop automation tools for system provisioning, configuration management, and application deployment
Collaborate with cross-functional teams to ensure that new software and systems are production-ready
Perform capacity planning and manage infrastructure capacity efficiently
Conduct root cause analysis of production issues and implement preventive measures
Participate in on-call rotations and respond to system emergencies
Ensure compliance with security and regulatory standards in all aspects of the infrastructure
Contribute to the continuous improvement of the reliability and performance of systems and applications
Implement best practices for cloud infrastructure and services
Lead initiatives to optimize system performance and stability
Conduct periodic testing of disaster recovery and failover systems
Document system configurations, processes, and procedures
Assist in evaluating new technologies and methods to improve reliability and performance

Required Qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field
3+ years of experience in a site reliability engineering role
Proficiency in Linux system administration and troubleshooting
Strong programming skills in Python, Shell scripting, or other scripting languages
Experience with cloud platforms such as AWS, GCP, or Azure
Expertise in building and maintaining scalable, high-performance systems
Knowledge of containerization and orchestration technologies (Docker, Kubernetes)
Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK)
Ability to design and implement automated solutions for infrastructure and application deployment
Excellent troubleshooting and problem-solving skills
Understanding of networking concepts and protocols
Strong communication and collaboration skills
Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer) a plus

}

Apply now

Share this job

Twitter Facebook Linkedin Email