Companies running high-reliability services are getting better at defining their unique Site Reliability Engineer (SRE) needs and understanding which best practices to implement in their frameworks. However, the question of how to actually organize SRE teams raises difficulties. Do you upskill your current team? Do you embed SREs within your SWE team, or do you build a separate team entirely?
There are different implementations of SRE teams that accommodate the various DevOps adoption stages and can exist simultaneously within an organization. As SREs gain experience, they will naturally progress from one type of implementation to another.
Google lists six types of SRE teams as observed throughout the evolution of its SRE practice. The six implementation types can be primarily grouped into two categories: embedded SREs and dedicated SRE teams.
Google’s SRE-only teams are highly specialized, focusing on specific actions such as maintaining shared services (infrastructure team), building software to improve system reliability (tools team), or running and scaling a critical application or business area (product/application team). These teams evolved from the first-ever SRE team at Google, known as the Kitchen Sink.
The Kitchen Sink or “Everything SRE” team implementation is generally the first and only SRE team in place and may expand organically over time, as in the Google example. The Kitchen Sink is recommended for companies that have outgrown what can be done without a dedicated SRE team but are yet to require multiple SRE teams.
Embedded SRE teams
These SRE teams are attached to a product, service, or application team. According to the Google approach, there is usually one SRE per team. The embedded SRE acts as a Subject Matter Expert, working closely with its SWE counterparts, usually on a project basis. In addition, embedded SREs have a hands-on role, updating the base code and configuration of the services.
This type of implementation is best suited to start an SRE function or scale another team. By driving the adoption of SRE best practices, embedded SREs can help expand the SWE team’s positive impact.
Consulting
The consulting implementation derives from the embedded approach, with the main difference that consulting SRE teams rarely make code and configuration changes. Also known as “Customer Reliability Engineers,” these SRE teams are recommended for large companies that have outgrown the capacity of the different SRE teams.
Pro and Cons of Embedded SREs
Pros
Cons
Pros
Cons
As with DevOps, there is no comprehensive guide to structuring SRE teams. The way you organize your SRE teams depends mainly on the organization’s maturity level. For example, if you are starting out on your SRE journey, you may want to consider assigning some engineering time to test out SRE-related practices. Although it may be time-consuming, this preliminary step allows you to evaluate your SRE needs and adapt the methodology accordingly, without significant investment or sudden organizational change.
Whether you are just embarking on your SRE implementation journey or are in need of scaling your existing teams, start by evaluating your organization’s requirements. List the pros and cons of your existing SRE team implementation to understand your team’s maturity level better and which type of implementation to follow next.