JPMorganChase logo

Site Reliability Engineer [Multiple Positions Available]

JPMorganChase
1 day ago
Full-time
On-site
Plano, Texas, United States
Description

Duties: Build automations using programming languages to reduce manual effort. Define and collect metrics from systems and applications using industry-standard applications or custom-built processes. Design and develop visualizations of system health. Respond to incidents of system instability or unavailability, diagnosing problems, writing software to resolve issues, and performing Root Cause Analyses to determine the reason for an outage. Perform system logging analysis to ensure application stability. Troubleshoot system and network issues to find potential areas for improvement.

QUALIFICATIONS:

Minimum education and experience required: Master's degree in Computer Engineering, Computer Science, Electronic Engineering, Computer Information Systems, or related field of study plus two (2) years of experience in the job offered or as Site Reliability Engineer, Software Engineer, Software Developer, or related occupation. The employer will alternatively accept a Bachelor's degree in Computer Engineering, Computer Science, Electronic Engineering, Computer Information Systems, or related field of study plus five (5) years of experience in the job offered or as Site Reliability Engineer, Software Engineer, Software Developer, or related occupation.

Skills Required: This position requires experience with the following: Monitoring platform and application health, including CPU, memory, disk capacity, and API responses using Dynatrace or Datadog; Logging queries and performing analysis for incident troubleshooting using ElasticSearch or AWS CloudWatch; Managing incidents and conducting blameless post-mortems; Designing and developing APIs to support data collection or task automation using Python, Java, Spring Boot, or C#.NET; Automating manual tasks using Microsoft PowerShell or Bash; Implementing observability using white-box and black-box monitoring; Managing incidents using service level objective alerting; Performing telemetry collection for observability using Dynatrace, Prometheus, Datadog, AWS CloudWatch, and Splunk; Developing dashboards to display system, application, and business metrics using Grafana or Splunk; Implementing continuous integration and delivery using Jenkins and Terraform; Managing containers and container orchestration using ECS, Kubernetes, and Docker; Troubleshooting Transmission Control Protocol, Internet Protocol, API communications, and client- server computing to diagnose and resolve application and system failures using Dynatrace and Wireshark.

Job Location: 8181 Communications Pkwy, Plano, TX 75024.

Full-Tume.