Staff Site Reliability & DevOps Engineer - Observability
BrandwatchSocial Media company
RemoteLead
Software Engineering
About the role
Design, operate, and evolve observability platforms using Grafana and Prometheus.
- •This role focuses on designing, operating, and evolving observability platforms with a strong emphasis on metrics, logging, and alerting, primarily using Grafana and Prometheus.
- •You will ensure production systems are observable, reliable, and operable at scale, working closely with platform, infrastructure, and application teams.
- •Key Responsibilities Design, build, and operate observability platforms based on Grafana and Prometheus.
- •Define and maintain metrics standards, dashboards, alerts, and SLOs.
- •Support incident response by providing actionable telemetry and post-incident analysis.
- •Automate observability configuration using infrastructure as code.
- •Requirements Strong experience with Prometheus and Grafana.
- •Solid Linux and networking fundamentals.
- •Experience running observability stacks in Kubernetes environments.
- •Infrastructure as code experience (Terraform preferred).
Tech stack
PrometheusGrafanaLinuxKubernetesTerraform
Match insights
Tech:Prometheus, Grafana, Linux, Kubernetes, Terraform
Level:Lead