Job Description
Responsibilities
- Design, build, and maintain automation solutions to reduce manual operational tasks.
- Develop and implement Infrastructure as Code (IaC) for provisioning and managing environments.
- Monitor system health using logs, metrics, and observability tools to detect and prevent issues.
- Implement automated incident detection and response processes to reduce downtime.
- Collaborate with development teams to improve application reliability, performance, and scalability.
- Define, track, and support Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
- Improve system reliability through proactive analysis, performance tuning, and capacity planning.
- Build and maintain CI/CD pipelines to support continuous deployment and integration.
- Participate in incident management, root cause analysis, and post-mortem reviews.
- Suppor...