Site Reliability Engineer
Zama
This job is no longer accepting applications
See open jobs at Zama.See open jobs similar to "Site Reliability Engineer" PL VS.Job Description
We are looking for an experienced Site Reliability Engineer (SRE) with a strong background in application monitoring, incident response, and Kubernetes to join our infrastructure team. In this role, you will be responsible for ensuring the reliability, performance, and availability of our blockchain services, web applications, and payment gateways. You will develop and implement monitoring strategies, respond to incidents, and continuously improve our infrastructure’s resilience. ⚙️🚀
Key Responsibilities:
🔍 Monitoring and Observability: Design, implement, and maintain comprehensive monitoring solutions for our blockchain applications using tools like Prometheus and the Grafana stack. Ensure real-time visibility into system performance, availability, and health.
🚨 Incident Response: Develop and refine incident response protocols to quickly and effectively address system outages, performance issues, and security incidents. Lead incident response efforts, including root cause analysis and post-mortem reporting.
🤖 Automation: Automate repetitive tasks related to monitoring, incident response, and infrastructure management using scripting languages like Python or Bash.
📊 Service Levels: Define and measure Service Levels for critical services. Continuously monitor and adjust strategies to meet these objectives and ensure optimal service reliability.
📈 Capacity Planning: Monitor system capacity and performance trends, and make recommendations for scaling infrastructure to meet growing demand.
🤝 Collaboration: Work closely with DevOps, blockchain developers, and security teams to ensure seamless integration of monitoring and incident response protocols into our overall infrastructure.
📝 Documentation: Maintain thorough and up-to-date documentation of monitoring setups, incident response protocols, and troubleshooting guides.
This job is no longer accepting applications
See open jobs at Zama.See open jobs similar to "Site Reliability Engineer" PL VS.