Senior Reliability Operations Engineer
Serve RoboticsRobotic Delivery company
Penang, MalaysiaSenior
Software Engineering
About the role
Lead operational reliability for robotic and cloud systems in Malaysia.
- •The Senior Reliability Operations Engineer leads operational reliability by region owning incident response, escalations, and Tier 2 support for robotic and cloud systems.
- •Key Responsibilities Respond to escalations from Tier 1 support, using runbooks, metrics, logs, and system diagnostics to investigate and remediate issues or determine when escalation to Tier 3 is necessary.
- •Develop and update runbooks, workflows, and operational documentation to ensure consistent and reliable responses to recurring issues, collaborating with product teams to expand coverage over time.
- •Write, maintain, and enhance automation scripts and tools that streamline common remediation steps, improve response times, and reduce manual operational overhead.
- •Requirements Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent practical experience. 5+ years of professional experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function.
- •Strong proficiency with Linux, including navigating systems, reviewing logs, and performing diagnostics.
- •Experience writing, executing, and maintaining runbooks, automations, and operational workflows.
Tech stack
LinuxGrafanaPrometheusGoogle CloudJiraPagerDuty
Match insights
Tech:Linux, Grafana, Prometheus, Google Cloud, Jira
Level:Senior