Job Description
Job Summary
We are seeking a hands-on L3 Support Engineer to serve as the escalation point for critical production issues across trading and risk systems.
This role focuses on:
- Deep incident diagnosis
- Partnering with engineering teams for fixes and reliability improvements
- Building tooling, runbooks, and documentation to enable L1/L2 teams to resolve issues more efficiently
Key Responsibilities
- Own L3 escalations end-to-end: triage, root cause analysis, remediation, and post-incident follow-up
- Collaborate with engineering and stakeholders to resolve critical production issues
- Partner with product and engineering teams to improve application reliability, operability, and supportability
- Enhance logging, monitoring metrics, and alert quality
- Develop scripts, runbooks, diagnostics, and tools to reduce mean time to resoluti...