e

Site Reliability Engineering (SRE)- W2

eTek IT Services
Contract
On-site
San Mateo, United States
Required Skills• Must Haves: 3 to 5 years exp. Kubernetes, DataDog, cloud services, large scale systems, AWS&GCP, minor Azure • GKE, home strung clusters on prem, and AKS (Very Small), EKS • Consistent upgrades across all the clusters and clouds • Nice to Have: Gaming experience bonus

Required Qualifications 
  • 6+ years of demonstrated influence across one or more teams for large scale projects that drive impact and improvement across the organization 
  • 6+ years of experience in an SRE role for online services in a multi-region, multi-cloud environment with specific experience in reliability and resliency 
  • 6+ years of developing tools for automation of processes or augmenting off the shelf tool functionality 
  • 6+ years of AWS and/or GCP cloud experience running highly elastic mission critical workloads 
  • 6+ years of coding experience in at least one or more of Python, Ruby, Java, or Go and a good understanding of code management 
  • 6+ years of experience using Infrastructure as Code tools like Terraform, Pulumi, or others 
  • Extensive knowledge of software build, test, and deploy processes using Git, Jenkins, Puppet, Ansible, Docker/containers, and Kubernetes 
  • Experience with system analysis and troubleshooting 
  • Serve as a mentor to junior engineers and provide technical leadership to the organization. 
Bonus Points 
  • Prior hands-on experience running large scale multiplayer video games at scale 
  • Experience designing and crafting software for systems and network automation 
  • Debugging, code optimization, and routine task automation skills 
  • Demonstrated ability to decompose sophisticated problems. Ability to engage in lateral investigations.