Service Reliability Engineering (SRE) Task
Jump to navigation
Jump to search
A Service Reliability Engineering (SRE) Task is a software engineering task which ensures that an information system can meet its service level agreements throughout its life).
- Context(s):
- It can be performed by a Service Reliability Engineer.
- It can include Production System Improvement, such as: IT system performance tuning.
- It can include responsibility for the availability and reliability of critical platform services and applications.
- …
- Counter-Example(s):
- Software Engineer, such as a backend software engineer.
- Security Engineer.
- See: DevOps, Software Engineering, IT Infrastructure, IT Operations, Scalable IT System, High Availability, Service Reliability.
References
2022
- https://business.linkedin.com/talent-solutions/resources/talent-engagement/job-descriptions/site-reliability-engineer
- Site reliability engineers (SREs) combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. For organizations, SREs are typically responsible for the availability and reliability of critical platform services and applications, ensuring they meet the requirements of internal and external users. The best SREs are motivated to collaborate with business leaders in building and running sustainable production systems, which can evolve and adapt to changes in a global business environment. ...
...
- Site reliability engineers (SREs) combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. For organizations, SREs are typically responsible for the availability and reliability of critical platform services and applications, ensuring they meet the requirements of internal and external users. The best SREs are motivated to collaborate with business leaders in building and running sustainable production systems, which can evolve and adapt to changes in a global business environment. ...
Run the production environment by monitoring availability and taking a holistic view of system health Build software and systems to manage platform infrastructure and applications Improve reliability, quality, and time-to-market of our suite of software solutions Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve Provide primary operational support and engineering for multiple large distributed software applications
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding Partner with development teams to improve services through rigorous testing and release procedures Participate in system design consulting, platform management, and capacity planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined service level objectives
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Site_reliability_engineering Retrieved:2021-9-10.
- Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of Devops.
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Systems_engineering#Related_fields_and_sub-fields Retrieved:2021-9-10.
- Reliability engineering is the discipline of ensuring a system meets customer expectations for reliability throughout its life; i.e., it does not fail more frequently than expected. Next to prediction of failure, it is just as much about prevention of failure. Reliability engineering applies to all aspects of the system. It is closely associated with maintainability, availability (dependability or RAMS preferred by some), and logistics engineering. Reliability engineering is always a critical component of safety engineering, as in failure modes and effects analysis (FMEA) and hazard fault tree analysis, and of security engineering.