Title: DevOps/SRE/Rel-manager
Location: Jersey City, NJ (Day 1 Onsite)
Duration: 06+ Months
Need 10 plus years candidate in Devops
Skills: Prometheus, Grafana,Terraform, AWS, GCP, Azure,Docker
Position's General Duties and Tasks
In this role you will be responsible for:
Site Reliability Engineering (SRE):
· Monitor, maintain, and improve the reliability, availability, and performance of critical services.
· Develop and implement monitoring solutions (e.g., Prometheus, Grafana) to track system health and performance.
· Automate repetitive tasks and improve infrastructure efficiency using tools like Terraform, or similar.
· Create and maintain Service Level Objectives (SLOs), Service Level Agreements (SLAs), and Service Level Indicators (SLIs) to drive reliability improvements.
Participate in on-call rotations to handle incident response, root cause analysis, and mitigation strategies.
Release Management:
· Manage the end-to-end software release lifecycle, ensuring timely and smooth releases across all environments.
· Work with different teams to coordinate and validate code releases.
· Create and maintain a release calendar in collaboration with product and engineering teams to plan upcoming deployments.
· Troubleshoot issues during the release process and ensure post-release validation.
· Track and report release metrics to identify improvement opportunities and minimize downtime.
DevOps:
· Design, implement, and manage CI/CD pipelines (e.g., Jenkins, GitLab CI) to support continuous integration and deployment.
· Develop Infrastructure as Code (IaC) practices using tools like Terraform, AWS CloudFormation, or similar to manage infrastructure environments.
· Collaborate with development teams to create scalable solutions that meet business and technical requirements.
· Support containerization and orchestration efforts (Docker, Kubernetes) for application deployments.
· Drive adoption of DevOps best practices across teams, fostering a culture of automation and agility.
Requirements for this role include:
· Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).
· 5+ years of experience in SRE, DevOps, or related roles.
· Strong knowledge of cloud platforms (AWS, GCP, Azure) and cloud-native infrastructure.
· Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK Stack).
· Proficiency in scripting languages (Python, Bash, Go) and automation tools (Ansible, Puppet, Terraform).
· Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab, CircleCI).
· Experience with release management, managing production deployments, and ensuring stable releases.
· Familiarity with containerization technologies (Docker) and orchestration tools (Kubernetes).
· Strong problem-solving skills, attention to detail, and ability to work in a fast-paced environment.
· Knowledge of GitOps, chaos engineering, and incident management tools (PagerDuty, Opsgenie).