Service Reliability Engineering (SRE) Team
A Service Reliability Engineering (SRE) Team is a software engineering team that is focused on service reliability engineering tasks (to ensure that an information system can meet its service level agreements throughout its life).
- Context:
- They can (typically) contain SRE Engineers.
- They can (typically) collaborate with Software Development Team.
- ...
- Example(s):
- a Google SRE Team.
- a NYTimes SRE Team.
- …
- Counter-Example(s):
- See: Service Reliability.
References
2019
- "How SRE teams are organized, and how to get started."
- QUOTE: ... In this post, we’ll cover how different implementations of SRE teams establish boundaries to achieve their goals. We describe six different implementations that we’ve experienced, and what we have observed to be their most important pros and cons. Keep in mind that your implementations of SRE can be different—this is not an exhaustive list. In recent years, we’ve seen all of these types of teams here in the Google SRE organization (i.e., a set of SRE teams) except for the “kitchen sink.” The order of implementations here is a fairly common path of evolution as SRE teams gain experience. ...
2018
- https://sre.google/workbook/engagement-model/
- QUOTE: ... The developer and SRE teams both care about reliability, availability, performance, scalability, efficiency, and feature and launch velocity. However, SRE operates under different incentives, mainly favoring service long-term viability over new feature launches.
In our experience, developer and SRE teams can strike the right balance here by maintaining their individual foci but also explicitly supporting the goals of the other group. SREs can have an explicit goal to support the developer team’s release velocity and ensure the success of all approved launches. For example, SRE might state, “We will support you in releasing as quickly as is safe,” where “safe” generally implies staying within error budget. Developers should then commit to dedicating a reasonable percentage of engineering time to fixing and preventing the things that are breaking reliability: resolving ongoing service issues at the design and implementation level, paying down technical debt, and including SREs in new feature development early so that they can participate in design conversations. ...
- QUOTE: ... The developer and SRE teams both care about reliability, availability, performance, scalability, efficiency, and feature and launch velocity. However, SRE operates under different incentives, mainly favoring service long-term viability over new feature launches.