Site Reliability Engineer
CognitionAI Software company
San Francisco, United States$260,000 - $300,000Senior
Software Engineering
About the role
Ensure production reliability and build platform engineering tools to enable fast, safe shipping.
- •Owner of production reliability and platform engineering for Devin and Windsurf, ensuring high availability and fast incident resolution while enabling fast, safe shipping.
- •Key Responsibilities Define and own SLOs, SLIs, and error budgets and build observability systems.
- •Lead incident response, on-call rotations, and blameless postmortems.
- •Own CI/CD pipelines, deployment infrastructure, and developer tooling.
- •Manage cloud infrastructure as code and perform capacity planning.
- •Integrate security into reliability and drive reliability-first culture.
- •Requirements Deep experience running production systems at scale with SLOs and incident command.
- •Strong software engineering fundamentals; writing production code for SRE tasks.
- •Proficiency with cloud providers (AWS/GCP/Azure), Kubernetes, and Terraform or equivalent.
- •Experience building CI/CD and deployment infrastructure for fast-moving teams.
Tech stack
CI/CDTerraformKubernetesAWSGoogle CloudAzureDatadogPrometheusGrafana
Match insights
Tech:CI/CD, Terraform, Kubernetes, AWS, Google Cloud
Level:Senior