This role is no longer accepting applications via Rocketlist.
About the role
Design, implement, and maintain observability pipelines using OpenTelemetry.
- •Join dLocal as a Site Reliability Engineer to focus on observability using OpenTelemetry.
- •You will design, implement, and maintain observability pipelines.
- •Key Responsibilities Own OpenTelemetry Pipelines Empower Engineering Teams Support Incident Management Collaborate Across Teams Automate Observability Infrastructure Define Baseline Observability Standards Own Technical and Security Health Optimize Alerting Systems Requirements Over 4 years’ of experience as SRE Engineer or in a very similar role more focused on observability.
- •Expertise in Kubernetes, including its core components, deployment methodologies, and monitoring best practices.
- •Some understanding of OpenTelemetry, including setting up OTEL collectors, instrumentation, and pipeline optimization.
- •Proficiency with monitoring and logging tools such as Grafana, Prometheus, Loki, New Relic, or Datadog.
- •Hands-on experience with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD, GitHub Actions, or similar).
- •Experience integrating incident management platforms (PagerDuty, Jira) with automated alerting workflows.
- •Strong scripting abilities (Python, Go, or similar) for automating observability tasks.
- •A problem-solving mindset, with the ability to collaborate across multi-functional teams to drive reliability improvements.
Tech stack
KubernetesGrafanaPrometheusNew RelicDatadogTerraformArgoCDGitHub ActionsPagerDutyJiraPythonGoAWSECSAnsibleChef
Match insights
Tech:Kubernetes, Grafana, Prometheus, New Relic, Datadog
Level:Senior