Skip to content
Fal AI logo

Software Engineer, Site Reliability

Fal AIGenerative Media company
RemoteSenior
Software Engineering

About the role

Keep production infrastructure running at scale.

  • You are a seasoned SRE who keeps production infrastructure running at scale.
  • Key Responsibilities Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads Build and maintain CI/CD pipelines and deployment infrastructure Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability Requirements 5+ years experience in managing critical production systems and software development workflows Strong production experience setting up and operating Kubernetes at scale, using infrastructure-as-code (Terraform, Ansible) Deep knowledge of Linux networking, container networking (CNI plugins, VXLAN, BGP), and DNS
View original posting →

Tech stack

KubernetesCI/CDLinuxTerraformAnsible

Match insights

Tech:Kubernetes, CI/CD, Linux, Terraform, Ansible
Level:Senior

More roles at Fal AI

View open roles at Fal AI