Site Reliability Engineer

CognitionAI Software company

San Francisco, United States$260,000 - $300,000Senior

Software Engineering

Bookmark Apply on site→

About the role

Ensure production reliability and build platform engineering tools to enable fast, safe shipping.

•Owner of production reliability and platform engineering for Devin and Windsurf, ensuring high availability and fast incident resolution while enabling fast, safe shipping.
•Key Responsibilities Define and own SLOs, SLIs, and error budgets and build observability systems.
•Lead incident response, on-call rotations, and blameless postmortems.
•Own CI/CD pipelines, deployment infrastructure, and developer tooling.
•Manage cloud infrastructure as code and perform capacity planning.
•Integrate security into reliability and drive reliability-first culture.
•Requirements Deep experience running production systems at scale with SLOs and incident command.
•Strong software engineering fundamentals; writing production code for SRE tasks.
•Proficiency with cloud providers (AWS/GCP/Azure), Kubernetes, and Terraform or equivalent.
•Experience building CI/CD and deployment infrastructure for fast-moving teams.

View original posting →

View original posting for full requirements →

Tech stack

CI/CDTerraformKubernetesAWSGoogle CloudAzureDatadogPrometheusGrafana

Match insights

Tech:CI/CD, Terraform, Kubernetes, AWS, Google Cloud

Level:Senior

More roles at Cognition

View open roles at Cognition