Senior Reliability Operations Engineer
Serve RoboticsRobotic Delivery company
Stockholm, SwedenSenior
Software Engineering
About the role
Lead incident response and operational reliability for robotic and cloud systems.
- •Serve as the primary incident lead during your region’s daytime hours, coordinating technical investigations, centralizing communication, and engaging the appropriate engineering and SRE teams when escalation is required.
- •Key Responsibilities Serve as the primary incident lead during your region’s daytime hours, coordinating technical investigations, centralizing communication, and engaging the appropriate engineering and SRE teams when escalation is required.
- •Respond to escalations from Tier 1 support, using runbooks, metrics, logs, and system diagnostics to investigate and remediate issues or determine when escalation to Tier 3 is necessary.
- •Develop and update runbooks, workflows, and operational documentation to ensure consistent and reliable responses to recurring issues, collaborating with product teams to expand coverage over time.
- •Requirements Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent practical experience. 5+ years of professional experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function.
- •Demonstrated experience owning or participating in Tier 2 or Tier 3 technical investigations, including triage, log analysis, and structured escalation.
Tech stack
LinuxGrafanaPrometheusGoogle CloudJiraCI/CDKubernetesPagerDuty
Match insights
Tech:Linux, Grafana, Prometheus, Google Cloud, Jira
Level:Senior