DESCRIPTION:
Duties: Demonstrate and support site reliability culture and practices and exerts technical influence. Lead initiatives to improve the reliability and stability of CCB applications. Platforms using data-driven analytics to improve service levels. Collaborate with team members to identify comprehensive service level indicators. Work with stakeholders to establish reasonable service level objectives and error budgets with customers. Demonstrate a high level of technical expertise within one or more technical domains and proactively identify and solve technology-related bottlenecks. Acts as the main point of contact during major incidents for applications. Identify and solve issues quickly to avoid financial losses. Document and share knowledge within organization via internal forums and communities of practice.
QUALIFICATIONS:
Minimum education and experience required: Bachelor's degree in Information Technology, Engineering, Computer Applications, Computer information systems or related field of study plus 7 years of experience in the job offered or as Site Reliability Engineer, Architect, or related occupation. The employer will alternatively accept a Master's degree in Information Technology, Engineering, Computer Applications, Computer information systems or related field of study plus 5 years of experience in the job offered or as Site Reliability Engineer, Architect, or related occupation.
Skills Required: This position requires three (3) years of experience with the following: Implement practices within a distributed application hosted on Public, Private cloud, and using reliability, scalability, performance, security, enterprise system architecture, toil reduction, automation, building observability and SLI/SLO. This position requires experience with the following: utilizing programming languages including Python, Java and Spring Boot for toil reduction and automation; Observability including white and black box monitoring, SLO alerting, and telemetry collection using tools including Grafana, Dynatrace, Prometheus, Datadog, and Splunk; continuous integration and continuous delivery tools including Jenkins, GitLab, and Terraform to cover application building and deployments on Public, Private cloud environments and to deploy code to production more quickly, reliably, also standardize the build, test, and deployment process reducing human error and ensuring deployments are performed; Using Container and container orchestration including ECS, Kubernetes, EKS, Private Cloud, Docker and troubleshooting these platform issues; Troubleshooting technologies and issues; Identifying and solving problems related to complex data structure and algorithms.
Job Location: 8181 Communication Parkway, Plano, TX 75024.