Haramain Systems logo

Site Reliability Engineer (M&O) : W2 Role

Haramain Systems
Contract
Remote
Georgia and United States
Job Description

Role: Senior Associate – Monitoring & Observability (M&O)

Duration : Long Term Contract

Location : Remote But Sometimes Travel required to Knoxville, TN


As a Senior Associate in Monitoring & Observability, you will play a hands-on role in implementing and supporting observability solutions across cloud and hybrid environments. You’ll collaborate with Cloud Operations, Architecture, and Platform Engineering teams to ensure reliable monitoring, actionable alerting, and effective incident response workflows. This role blends aspects of Observability, SRE, and Incident Management, with a focus on execution and continuous improvement rather than team leadership.


What you would do :

Implement and configure monitoring and alerting tools across cloud platforms to standardize observability practices.

Work with tools such as Splunk, OpenTelemetry, AWS CloudWatch, GuardDuty, Wiz to deliver visibility across applications and infrastructure.

Build and maintain dashboards, alerts, and incident response workflows for proactive issue detection and resolution.

Support the integration of observability tools with ServiceNow ITOM and CMDB to enable automated incident and asset tracking.

Collaborate with engineering and operations teams to define SLOs/SLIs and reliability goals.

Participate in the automation of monitoring setups and ensure observability is embedded in CI/CD pipelines.

Troubleshoot monitoring and observability issues to improve system visibility and reduce MTTR.

Stay current with observability best practices and assist in introducing new tools and methods.


Skills & Experience Required

5+ years of experience in infrastructure engineering, systems operations, or platform support, with a strong focus on monitoring and observability.

Hands-on experience with tools like Splunk, OpenTelemetry, AWS CloudWatch, GuardDuty, Wiz.

Understanding of logging, metrics, tracing, and observability best practices.

Experience building dashboards, alerts, and reporting for system and application monitoring.

Familiarity with incident response workflows and ITSM/ITOM tools such as ServiceNow.

Exposure to cloud platforms (AWS, Azure, or GCP) and containerized environments (Docker, Kubernetes).

Knowledge of SRE concepts (SLOs, SLIs, error budgets) and their application in monitoring strategies.

Strong problem-solving and troubleshooting skills.

Good communication and collaboration skills in cross-functional teams.


Preferred certifications

Certifications in cloud platforms or monitoring tools (e.g., AWS Cloud Practitioner, Datadog, Splunk).

Experience with infrastructure as code (Terraform, CloudFormation) to automate observability setup.

Exposure to AIOps or advanced analytics for proactive monitoring.

Experience in multi-cloud or hybrid observability environments.