This role will be 70\-80% focused on working across non\-functional requirements, production assurance and DevOps team building observability capability for two AWS greenfield applications. ITRS Geneos experience would be a valuable skill set.<\/span><\/span><\/span><\/span><\/span>
<\/div>
Design and implement observability solutions for AWS\-based applications, including metrics, logging, and distributed tracing. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Define and deliver non\-functional requirements (performance, scalability, reliability) for production systems. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Build and maintain monitoring dashboards and alerting systems to ensure application health and SLA compliance. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Collaborate with development and operations teams to integrate observability into CI\/CD pipelines. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Automate deployment, monitoring, and alerting processes using DevOps best practices. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Conduct root cause analysis and improve incident response processes. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Drive production assurance activities and ensure resilience in low\-connectivity environments. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
<\/ul>
Expertise in monitoring and observability tools (Prometheus, Grafana, ELK, OpenTelemetry). <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Familiarity with ITRS Geneos for advanced monitoring and alerting. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Solid understanding of non\-functional requirements and performance engineering. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Proficiency in scripting languages (Python, Shell, etc.). <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Experience with containerisation (Docker, Kubernetes). <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
Strong troubleshooting and problem\-solving skills. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>
<\/ul>
Nice to Have : <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>
<\/div>
Knowledge of ITIL processes and production support best practices. <\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> <\/li>