Director Of Site Reliability Engineering

 

Description:

We are strengthening the team and looking for a Director of SRE to lead our staff and ensure teams achieve our goals. Wikimedia Foundation’s Site Reliability Engineering team is responsible for ensuring our global top-10 web site and other public facing services are healthy, and developing its infrastructure, platform and services further in the enablement of Wikimedia Foundation’s mission. The SRE team comprises over 45 creative and talented staff members who are globally distributed, organized into seven teams each with their own scope and focus area.

 

Responsibilities:

 

  • Your first priority: Lead multiple SRE teams in keeping Wikimedia’s sites and services (including Wikipedia) running responsively, reliably and securely, including protection against outages, data loss or breaches, and accommodation and implementation of Wikimedia’s Movement Strategy (including “Infrastructure for Open”).
  • Your second priority: Partner with engineering teams at Wikimedia to set direction and build platforms enabling transformative changes to Wikimedia’s user experience while ensuring appropriate operational review and support along the way.
  • Your foundation: An amazing Site Reliability Engineering team that’s taken us to more than half a billion users a month with passion, ingenuity, solid engineering practices and duct tape. Nurturing, growing, trusting and developing this team and its leaders is your path to success in this role.
  • Your values: You care about free and open information, and are committed to finding solutions to engineering problems in line with our guiding principles. You share our values and work in accordance with them.

 

Qualifications:

 

  • 8+ years experience in site reliability engineering, technical operations, or infrastructure engineering roles
  • Track record of managing, inspiring and mentoring multiple managers and engineers, and aligning them across the organization and in the community
  • Experience in managing large-scale projects with technical deep-dives into code, networking and operating systems
  • Experience developing and tracking department and project budgets
  • Experience in globally distributed, multi-site high-traffic environments, preferably with both on-premise bare-metal and cloud based infrastructure
  • Familiarity with open source development and community practices. Experience adopting/integrating open source solutions. Track record of upstream contributions (whether personal or through a team) is a huge plus
  • Familiarity with engineering team practices and experience interfacing SRE with other design, product and engineering teams tasked with continuous delivery of functionality
  • Familiarity with large website application architectures, including caching layers, storage scaling concepts, network infrastructure, monitoring systems, etc.
  • Experience with highly geographically distributed teams and follow-the-sun operations is a major plus. Personal cross-cultural experience (having lived, or worked internationally) helps as well

 

Organization Wikimedia Foundation
Industry Engineering Jobs
Occupational Category Director of Site Reliability Engineering
Job Location New York,USA
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Experienced Professional
Experience 8 Years
Posted at 2023-10-26 8:00 am
Expires on 2024-12-29