For our client we are looking for a Observability DevOps Developer.
Key Responsibilities:
- Design and Implement Observability Solutions: Build and maintain observability tools (monitoring, logging, tracing) to ensure the health and performance of microservices running on AWS.
- Monitoring & Logging: Set up and optimize monitoring using tools like Prometheus, Grafana, CloudWatch, OTEL and Splunk stacks for real-time insights into the AWS infrastructure.
- Distributed Tracing: Implement distributed tracing solutions (e.g., Open Telemetry, Jaeger) to trace and debug service interactions across multiple microservices.
- Proactive Alerting: Establish alerting mechanisms to detect performance anomalies and potential failures in real-time.
- Dashboards & Reporting: Create dashboards and reports to monitor service-level objectives (SLOs), key performance indicators (KPIs), and overall system health.
- Incident Management: Investigate and troubleshoot issues, identifying root causes, and providing insights to reduce mean time to detection (MTTD) and mean time to resolution (MTTR).
- Collaboration with Teams: Collaborate with DevOps and development teams to ensure observability best practices are embedded into CI/CD pipelines and infrastructure as code (IaC) practices.
- Automation & Optimization: Automate manual monitoring and incident management processes to reduce operational overhead.
Tool chain used:
- Frameworks: Docker, Kubernetes
- Infrastructure: AWS
- Development & GitOps tools: Gitlab, ArgoCD, Harbor, Sonarqube, Dependency Tracker, GIT
- Observability support tools: OTEL, Splunk, Pagerduty, Apica, Grafana, Slack, Confluence, Jira
- SW Languages: Python, Java, JavaScript, Typescript, Terraform, Ansible