About the role
Maintain and improve reliability, observability, and operability of Alpaca's brokerage platform.
- •Alpaca seeks a Site Reliability Engineer to maintain and improve reliability, observability, and operability of its brokerage platform, with a significant focus on PostgreSQL database reliability.
- •Key Responsibilities Operate production day-to-day: on-call, incident response, and postmortems.
- •Define and refine SLIs/SLOs and manage error budgets.
- •Improve observability across metrics, logs, and traces.
- •Ship infrastructure as code in a GitOps workflow for cloud and Kubernetes.
- •Manage PostgreSQL: tuning, online migrations, HA/DR, and CDC.
- •Requirements 4+ years in SRE/DevOps/platform or backend engineering with production operations ownership.
- •Hands-on Kubernetes and GitOps experience; strong Linux proficiency.
- •Solid PostgreSQL production knowledge: query plans, indexing, migrations.
- •Cloud networking fundamentals and observability experience.
Tech stack
KubernetesPostgreSQLGoPythonLinux
Match insights
Tech:Kubernetes, PostgreSQL, Go, Python, Linux
Level:Mid