Professionals who come into the SRE role are forward-thinkers with a unique background (typically at the intersection of demonstrated expertise in software development and traditional IT operations skills). A fundamental aspect of Site Reliability Engineering—that consistently makes SRE teams outperform other enterprise software development and support teams—is that SRE empowers engineers at all levels to develop their career paths and positively impact their team. SREs are encouraged and given the space to learn from each other and support each other’s growth.
Indeed, the SRE practice fosters strong leadership tracks. Today, a growing number of SRE-led organizations give Site Reliability Engineers the opportunity to become leaders without having to transition from individual contributors to traditional management levels. However, this begs the question: What is the role of an SRE technical leader in this empowering environment?
Or better yet, when an SRE wants to become a technical leader, does that mean taking on a people manager role? Can (or should) an SRE technical leader guide its team’s efforts without doing people management?
What does an SRE team do?
The purpose of Site Reliability Engineering (SRE) is to continually ensure and improve the efficiency and reliability of a company’s services or systems. SRE defines how to achieve and measure reliability through the use of specific tools—Service-Level Objectives (SLO), Service-Level Agreements (SLA), and Service-Level Indicators (SLI)—tied to the customer experience and business objectives.
Depending on the maturity level of its SRE practice, the industry, and the organization’s size, SRE teams can organize themselves in various ways to accommodate the company’s specific needs. But, regardless of the type of SRE team implementation, all SREs work towards the common goal of keeping and enhancing site reliability.
Whether at the team level or as an individual contributor, Site Reliability Engineers are responsible for availability, latency, performance, efficiency, monitoring, emergency response, and capacity planning. For instance, SREs may divide their time between writing code, building software and systems to manage IT infrastructure through automation, and solving customer-facing issues. The specific tasks and time allocation depend on multiple factors, with no strict rule on how to break down SREs’ work.
SRE roles transition
An individual SRE’s role within a team will change as the SRE organization evolves. Not only is it natural for SRE teams to progress from one type of implementation to the next, but it is also common for team members to transition from responsibility to responsibility, ensuring everyone understands and actively contributes to achieving the team’s specific goals.
As opposed to DevOps teams, where titles tend to reflect the specific focus of a role (i.e., release manager, automation architect, software developer, security engineer), in an SRE team, every member is typically called a Site Reliability Engineer. Moreover, since there is no industry formalization of role titles, these often vary from one SRE organization to the next. Nonetheless, all SREs within a given team contribute to maintaining and increasing system reliability and performance.
In SRE, failure is expected and planned for. Instead of striving to build services that are 100% reliable, SREs embrace risk and focus on strengthening their system’s resiliency. And the same principles apply to the team. SRE fosters continuous improvement and provides a safe space for failure. Throughout a team’s lifecycle, SREs will sometimes use their expertise and best skills to further the team’s work, while at other times, they will be given the occasion to work on projects that will help them upskill and expand their expertise domain.
SRE teams are highly successful because of this culture of mutual teaching, shared responsibility, and ownership. As SRE allows every individual contributor (IC) to make an impact on the other team members, the role of the SRE manager transcends people management. Whether you are a technical leader or a people manager, being a leader may look the same. The difference comes from the person’s background and the tasks you are working on.
What is the role of an SRE manager?
Within the realm of Site Reliability Engineering, being a leader does not necessarily entail being a people manager. Instead, present-day SRE managers are expected to provide a direct, on-keyboard impact while driving projects forward that have a broad impact across the SRE organization and influence how the entire company works.
The degree to which an SRE technical leader is hands-on or hands-off depends on each company’s size and SRE maturity level. Rather than seeing it as climbing ladders, SREs scale up their influence as the organization grows. Depending on the SRE organization’s evolution stage, there will be a certain amount of people management within the SRE technical leader’s responsibilities, but in the sense of coordinating multiple teams and guiding their efforts, defining their vision, listening, and, as Todd Palino noted in his 2019 SRECon presentation, being that “someone” in “someone should do something about this,” that person who makes SREs lives easier.
Why do we need technical leadership in SRE?
High-performing SRE teams consistently share knowledge and support each other’s growth, all with the common objective of increasing system reliability. So, how does the SRE manager fit into today’s empowered SRE organization? Is the role becoming obsolete? Traditionally speaking, perhaps. But in reality, the SRE manager role is evolving, shifting from a pure focus on people management to a unique combination of strong technical skills, business acumen, and leadership skills.
Today, the SRE technical leader does more than allocate tasks and projects. With a keen awareness of how customers use services and how services operate over the entire production environment, the SRE manager is an innovator, aligning SREs’ work with business goals. But most importantly, the SRE technical leader understands that people don’t scale vertically. Great SRE managers know that upskilling people is as important as updating technology and processes.
We need leadership within SRE to create the space for growth, support engineers to increase their skills, and define their projects’ visions. Improving your impact as a technical leader starts with empowering your team.