Job Description
Senior Site Reliability Engineer – AI Operations TechInsights is building the reliability and AI operations foundation for its next chapter — an AI-first intelligence platform that runs the most demanding semiconductor intelligence workflows in the world. We’re looking for a Senior Site Reliability Engineer who owns that foundation.
As a senior individual contributor at the technical leadership tier, you will own strategic reliability initiatives end‑to‑end: setting technical direction, defining SLOs and error budgets across our production platform, designing reliability patterns for AI agent pipelines, and enabling our development and AI Engineering teams to build and ship with confidence.
Platform Reliability & AI Operations
Own SLOs, SLIs, and error budgets for all production services; drive error budget discipline across engineering.
Design reliability patterns for AI agent pipelines: LLM observability, tool‑use tracking, failure detection, and g...
As a senior individual contributor at the technical leadership tier, you will own strategic reliability initiatives end‑to‑end: setting technical direction, defining SLOs and error budgets across our production platform, designing reliability patterns for AI agent pipelines, and enabling our development and AI Engineering teams to build and ship with confidence.
Platform Reliability & AI Operations
Own SLOs, SLIs, and error budgets for all production services; drive error budget discipline across engineering.
Design reliability patterns for AI agent pipelines: LLM observability, tool‑use tracking, failure detection, and g...