Description:
As a Senior Software Systems Engineer you will play a key role in ensuring the observability and performance of our complex, distributed systems. You will work closely with our engineering and operations teams to implement and maintain observability solutions using our supported platforms, and drive continuous improvement in our ability to understand and troubleshoot system behavior.
The role needs experienced engineers with excellent cross-functional and communication skills as well as backgrounds in operations, software development, systems performance and resilience engineering. Always focusing on the quality experience of our customers.
What You’ll Do:
- Design, implement, and maintain instrumentation for collecting metrics, logs, and traces across our software and infrastructure using our supported observability platforms.
- Develop and maintain monitoring and alerting systems using Splunk and Datadog to quickly identify and respond to system anomalies and performance issues.
- Development Terraform-based IAC content to support the automated instrumentation of Observability tooling
- Collaborate with software development and infrastructure teams to ensure observability best practices are integrated into our Infrastructure as Code (IaC) environment.
- Monitor Observability tooling performance for potential bottlenecks, identify possible solutions, and work with developers to implement those fixes.
- Utilize telemetry data for in-depth troubleshooting, identifying root causes, and optimizing system performance.
- Leverage historical telemetry data to identify trends, plan capacity, and implement proactive measures for ongoing improvement.
- Liaise with vendors and other IT personnel for problem resolution.
- Participate in rotating on-call incident response on the weekdays and on the weekends.
- Improve operational efficiencies via scripting, bots and integrations.
What You’ll Need:
- 3+ years Infrastructure as Code using Terraform
- BS/MS degree in Computer Science, Engineering or a related subject or relevant experience
- Experience with Observability tooling such as Splunk and Datadog.
- Experience operating in the Cloud, preferably in AWS
- Proficient in Linux
- Proficiency in common scripting language (preferably Python)
- Excellent communication, negotiation, decision making, and problem-solving skills