Senior Site Reliability Engineer

AlembicCausal AI, company

San FranciscoSenior

Software Engineering

Bookmark Apply on site→

About the role

Help scale platform with reliability, observability, and operational excellence.

•We're looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability, and operational excellence at the core.
•Key Responsibilities Design, build, and maintain scalable infrastructure to support real-time analytics and machine learning workloads Improve system reliability and performance through automation, observability, and proactive capacity planning Own and evolve CI/CD pipelines, deployment automation, rollback mechanisms, and config management Requirements 8+ years of experience in SRE, DevOps, or infrastructure engineering roles 5+ years of experience with datacenter operations and/or system and network administration Experience with containerization (Docker), and orchestration (Kubernetes)

View original posting →

View original posting for full requirements →

Tech stack

DockerKubernetesCI/CDGitHub ActionsArgoCDAnsiblePrometheusGrafanaDatadogTerraformBashPythonAirflow

Match insights

Tech:Docker, Kubernetes, CI/CD, GitHub Actions, ArgoCD

Level:Senior

More roles at Alembic

View open roles at Alembic