W

Site Reliability Engineer

Wesco
Full-time
Remote
United States
Description

As the Site Reliability Engineer, will be responsible for ensuring the availability, reliability, and  performance of our customer-facing software applications. This role combines planning, engineering, monitoring, incident response, and administration to create highly scalable and fault-tolerant systems. You will handle complex and escalated application and network infrastructure support cases including troubleshooting of ethernet networking problems. You will develop, review, and approve customer-facing and internal documentation on best practices, troubleshooting flowcharts, training materials and FAQs. You will act as a technical team lead, technical resource, and coach for the Associate Engineer – Support and Engineer – Support team members. 

Responsibilities:

  • Collaborate with engineering, technical services, and quality assurance divisions on any problems, software bugs or emerging customer needs
  • Ensure the high availability and reliability of the production environment by monitoring system health and performance
  • Provide primary operational support for large-scale distributed software applications
  • Facilitate incident resolution via triage, communication, engagement, escalation, and documentation
  • Partner with platform administration (both internal and external) to define and achieve stability and scalability objectives
  • Collaborate with technical and quality teams to improve services by identifying areas of risk and helping to define and proactively implement solutions
  • Drive continual improvement in system performance by setting service level objectives in collaboration with a performance center of practice and/or product development teams
  • Participate in system design, capacity planning, and platform management 
  • Analyze and publish metrics from operating systems and applications to assist in performance tuning and fault finding
  • Pursue opportunities for automation and process improvements
  • Delivers an exceptional levels of customer service, providing infrastructure support per Service Level Agreements (SLA).
  • Handles escalated cases, including troubleshooting of complex audio, video, and ethernet networking problems.
  • Handles security patch and vulnerability management.
  • Evaluates, identifies, and replicates issues and follows an escalation process to reach desirable outcomes to ensure positive customer experience.
  • Serves as a technical resource to other functional groups and individuals to improve service quality and user experience.
  • Develops customer-facing and internal documentation on best practices, troubleshooting flowcharts, training materials and FAQs to ensure consistent customer experience.
  • Takes ownership of the escalated cases from Associate Engineers and Engineers and takes it to the resolution.

Qualifications:

  • Bachelor's Degree - Engineering related discipline required; Master’s Degree preferred
  • Experience providing first-level incident response and troubleshooting with technical teams to resolve end-user issues
  • Proficiency with enterprise system monitoring software (examples: NewRelic, Nagios, Solarwinds, Dynatrace, Datadog, Azure Monitor, Splunk)
  • Experience with cloud-based infrastructure, databases, and applications 
  • Experience with performance tuning and fault finding in large-scale distributed systems.
  • Experience with designing, implementing, and managing performance testing practices, including specific tools and frameworks
  • Knowledge of disaster recovery planning and execution.
  • Ability to effectively work in a highly matrixed organization
  • Strong understanding of coding, automation, and engineering principles to build resilient, self-healing systems
  • Familiarity with DevOps practices and tools
  • Jira (or equivalent work management) Confluence (or equivalent knowledge management)
  • Licenses/Certificates/Designations - IT industry networking certifications such as CCNP or JNCIP; ITIL or equivalent
  • Minimum 5 years of experience supporting network and AV operations
  • 5 years required delivering support in ethernet technologies/AV and networking concepts
  • Advance knowledge of platform OS (router platform, VxLAN, WAN, LAN & routing protocols) and how they interact with the network
  • Ability to apply principles, theories, and concepts, as well as knowledge or related networking/AV disciplines
  • Advanced skills and knowledge and adherence to change management process.
  • Network routing & switching
  • Possess a customer-centric mindset
  • Possess strong computer skills, including proficiency with Microsoft Office Outlook, Word, Excel, and PowerPoint
  • Excellent oral and written communication

#LI-GS1