Skip to content
Cognition logo

Site Reliability Engineer

CognitionAI Software company
San Francisco, United States$260,000 - $300,000Senior
Software Engineering

About the role

Ensure production reliability and build platform engineering tools to enable fast, safe shipping.

  • Owner of production reliability and platform engineering for Devin and Windsurf, ensuring high availability and fast incident resolution while enabling fast, safe shipping.
  • Key Responsibilities Define and own SLOs, SLIs, and error budgets and build observability systems.
  • Lead incident response, on-call rotations, and blameless postmortems.
  • Own CI/CD pipelines, deployment infrastructure, and developer tooling.
  • Manage cloud infrastructure as code and perform capacity planning.
  • Integrate security into reliability and drive reliability-first culture.
  • Requirements Deep experience running production systems at scale with SLOs and incident command.
  • Strong software engineering fundamentals; writing production code for SRE tasks.
  • Proficiency with cloud providers (AWS/GCP/Azure), Kubernetes, and Terraform or equivalent.
  • Experience building CI/CD and deployment infrastructure for fast-moving teams.
View original posting →

Tech stack

CI/CDTerraformKubernetesAWSGoogle CloudAzureDatadogPrometheusGrafana

Match insights

Tech:CI/CD, Terraform, Kubernetes, AWS, Google Cloud
Level:Senior

More roles at Cognition

View open roles at Cognition