Jobsbridge logo

Site Reliability Engineer

Jobsbridge
Full-time
On-site
San Francisco, California, United States

Job Description

Responsibilities: 


• Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes. 

• Troubleshoot issues across the entire stack. Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions 

• Identify and drive opportunities to improve automation 

• Engage in service capacity planning and demand forecasting, software performance analysis and system tuning. 

• Participate in periodic on call duties. 

• Represent the SRE team in design reviews and operational readiness exercises for new and existing services 


Minimum qualifications: 


• BS degree in Computer Science or related technical field, or equivalent practical experience. 

• Minimum 5+ years of managing services in an internet scale *nix environment 

• Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices 

• Experience in one or more of: Java, Tomcat, Elastic Search, MySQL or scripting experience in Shell and Python. 

• Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols. 

• Strong hands on experience with configuration management tools like Ansible, Puppet, or Chef 

• Experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing. 

• Must work well with and be able to influence myriad personalities at all levels 

• Ability to prioritize tasks and work independently 

• Must be adaptable and able to focus on the simplest, most efficient & reliable solutions 

• Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills 



Desired qualifications: 


• Expertise in designing, analyzing and troubleshooting large-scale distributed systems. 

• In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work). 

• Familiarity with algorithms, data structures and complexity analysis. 

• Hands on Java and Apache optimization, performance tuning and configuration 

• Systematic problem solving approach, coupled with a strong sense of ownership and drive.

Qualifications

Linux Administration,Tomcat. Puppet

Additional Information

Multiple Openings