Service-Level Indicator (SLI) Measure
(Redirected from SLI)
Jump to navigation
Jump to search
A Service-Level Indicator (SLI) Measure is a performance metric that quantifies the quality and performance of an IT service (IT system).
- Context:
- It can serve as a foundation for defining Service-Level Objectives (SLO), which specify target values for key aspects of service performance.
- It can enable service providers to assess and monitor various dimensions of service quality, such as response time, availability, throughput, quality of service, and error rates.
- It can range from Simple SLI (such as system uptime) to being a Complex SLI (such as transaction completion rate, end-user satisfaction, and mean time between failures (MTBF)).
- It can promote alignment and transparency between service providers and clients by providing a shared understanding of service performance expectations and delivery.
- It can be classified as either a Proposed SLI Measure, which is under consideration for adoption, or an Existing SLI Measure, which is already in use.
- ...
- Example(s):
- Service Availability Measures:
- uptime measurement, calculating the total operational time of a service,
- downtime measurement, tracking the duration of service outages,
- mean time between failures (MTBF), quantifying the average time between service interruptions,
- mean time to repair (MTTR), measuring the average time needed to restore service after an outage,
- Service Performance Measures:
- response time, assessing how quickly the service responds to user requests,
- throughput, measuring the volume of transactions or data processed by the service over a given period,
- error rate, tracking the frequency of service errors or failures,
- resource utilization, monitoring the consumption of computing resources (e.g., CPU, memory, storage) by the service,
- Service Reliability Measures:
- mean time between failures (MTBF), quantifying the average time between service interruptions,
- mean time to repair (MTTR), measuring the average time needed to restore service after an outage,
- failure rate, tracking the frequency of service failures over a given period,
- Service Capacity Measures:
- concurrent users, measuring the number of users simultaneously accessing the service,
- transaction volume, tracking the number of transactions processed by the service over a given period,
- data storage capacity, monitoring the amount of data stored by the service,
- Service User Experience Measures:
- page load time, measuring the speed at which web pages become available to the user,
- user satisfaction score, gauging user satisfaction with the service through surveys or feedback,
- user retention rate, tracking the percentage of users who continue using the service over time,
- ...
- a Domain-Specific SLI, such as:
- an AI-based System SLI, such as: AI Model Accuracy Measure.
- Service Availability Measures:
- Counter-Example(s):
- Vanity IT Service Metrics, such as data points processed.
- Aggregate Metrics that do not account for data subgroups or edge cases, potentially masking important performance disparities or issues,
- Business Objectives, which encompass broader organizational goals not confined to specific, measurable service metrics,
- See: Service-Level Objective (SLO), Service-Level Agreement (SLA), Performance Indicator, Quality of Service (QoS), Information Technology, Service Provider.
References
2024
- https://incident.io/blog/six-key-service-level-indicators
- NOTES:
- SLIs are quantitative measures that evaluate the level of service provided by internal teams or service providers, helping to maintain customer satisfaction and operational efficiency.
- Response time is a critical SLI metric that measures the time taken by a system or service to respond to a specific request, influencing user experience and satisfaction.
- Error rate refers to the number of unsuccessful requests out of the total made during a specific time frame, allowing teams to identify and resolve recurring issues affecting system performance.
- Service availability focuses on the system's ability to process successful requests and is essential for maintaining user trust and satisfaction.
- System throughput quantifies the amount of work a system can handle within a given time frame, helping teams identify bottlenecks and ensure optimal capacity and efficiency.
- Response latency measures the delay before a response begins and is closely related to response time, with high latency disrupting user experience and making a service seem slow and unresponsive.
- Compliance is an SLI metric that measures how well services align with external standards and regulations, shaping customer trust, mitigating risk, and informing strategies.
- NOTES: