Senior Reliability Operations Engineer

Serve RoboticsRobotic Delivery company

Stockholm, SwedenSenior

Software Engineering

About the role

Lead incident response and operational reliability for robotic and cloud systems.

•Serve as the primary incident lead during your region’s daytime hours, coordinating technical investigations, centralizing communication, and engaging the appropriate engineering and SRE teams when escalation is required.
•Key Responsibilities Serve as the primary incident lead during your region’s daytime hours, coordinating technical investigations, centralizing communication, and engaging the appropriate engineering and SRE teams when escalation is required.
•Respond to escalations from Tier 1 support, using runbooks, metrics, logs, and system diagnostics to investigate and remediate issues or determine when escalation to Tier 3 is necessary.
•Develop and update runbooks, workflows, and operational documentation to ensure consistent and reliable responses to recurring issues, collaborating with product teams to expand coverage over time.
•Requirements Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent practical experience. 5+ years of professional experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function.
•Demonstrated experience owning or participating in Tier 2 or Tier 3 technical investigations, including triage, log analysis, and structured escalation.

LinuxGrafanaPrometheusGoogle CloudJiraCI/CDKubernetesPagerDuty

Tech:Linux, Grafana, Prometheus, Google Cloud, Jira

Level:Senior