Description:
We are seeking an experienced and dynamic Manager of Infrastructure Platform Engineering to lead and manage a team responsible for designing, building, and maintaining our infrastructure platform. The ideal candidate will have a strong background in cloud infrastructure, automation, and platform engineering, and will work closely with cross-functional teams to ensure the platform supports the organization’s growth, scalability, and reliability needs.
Your Impact
- Team Leadership - Lead, mentor, and grow a team of platform engineers, fostering a culture of collaboration, innovation, and continuous improvement
- Platform Development - Drive the design, implementation, and maintenance of cloud-based infrastructure platforms (e.g., AWS, Azure, GCP) and on-prem solutions to support company applications and services
- Automation & Scalability - Advocate for and implement automation practices, including Infrastructure-as-Code (IaC) using tools like Ansible, Terraform, CloudFormation, or similar - Ensure platforms are highly available, scalable, and cost-efficient
- Cross-Functional Collaboration - Partner with SRE and product teams to align infrastructure strategy with business goals, ensuring that infrastructure needs are met while balancing performance, cost, and security requirements
- Monitoring & Performance - Establish and maintain effective monitoring, alerting, and performance metrics for infrastructure health, system availability, and resource utilization - Proactively address performance bottlenecks and reliability issues
- Security & Compliance - Work with the security team to integrate security best practices into the infrastructure platform, ensuring that systems are compliant with regulatory requirements and secure by design
- Budget & Cost Management - Collaborate with senior leadership to manage the infrastructure budget, ensuring that resources are allocated efficiently and cost-optimization opportunities are identified and acted upon
Qualifications
Your Experience
- 7+ years of experience in Infrastructure software development, site reliability engineering, DevOps, preferably with 3+ years in a leadership role
- Experience managing large-scale, cloud-based infrastructure (e.g., AWS, Azure, GCP) and on-premise datacenters
- Deep knowledge of distributed systems, containerization (e.g., Docker, Kubernetes), microservices architectures, and cloud-native technologies
- Extensive experience with monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, Datadog, New Relic)
- Strong knowledge of infrastructure-as-code tools and frameworks (e.g., Terraform, Ansible, CloudFormation)
- Expertise in automation frameworks and scripting languages (e.g., Python, Bash, Go)
- Experience with incident management, root cause analysis, and post-mortem culture
- Excellent problem-solving skills and the ability to thrive in a fast-paced, high-pressure environment
- Exceptional leadership, interpersonal, and communication skills
- Advanced degree in Computer Science, Engineering, or related technical field or equivalent military experience
- Experience in multi-cloud environments and hybrid cloud architectures
- Familiarity with service mesh technologies (e.g., Istio) and cloud-native security practices
- Experience with cost management and optimization in cloud environments