Chaos Engineering
Jump to navigation
Jump to search
A Chaos Engineering is a software engineering practice that ...
- Context:
- It can be used to achieve resilience against:
- Infrastructure failures
- Network failures
- Application failures
- It can be used to achieve resilience against:
- See: Quality of Service, Fault Tolerance.
References
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Chaos_engineering Retrieved:2021-8-31.
- Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions.
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Chaos_engineering#Concept Retrieved:2021-8-31.
- In software development, a given software system's ability to tolerate failures while still ensuring adequate quality of service—often generalized as resiliency—is typically specified as a requirement. However, development teams often fail to meet this requirement due to factors such as short deadlines or lack of knowledge of the field. Chaos engineering is a technique to meet the resilience requirement.
Chaos engineering can be used to achieve resilience against:
- Infrastructure failures
- Network failures
- Application failures
- In software development, a given software system's ability to tolerate failures while still ensuring adequate quality of service—often generalized as resiliency—is typically specified as a requirement. However, development teams often fail to meet this requirement due to factors such as short deadlines or lack of knowledge of the field. Chaos engineering is a technique to meet the resilience requirement.
2017
- 14 Essential software engineering practices for your agile project."
- QUOTE: Chaos Engineering: Even when all of the individual services in a distributed system are functioning properly, the interactions between those services can cause unpredictable outcomes. Unpredictable outcomes, compounded by rare but disruptive real-world events that affect production environments, make these distributed systems inherently chaotic. You need to identify weaknesses before they manifest in system-wide, aberrant behaviors. Systemic weaknesses could take the form of: improper fallback settings when a service is unavailable; retry storms from improperly tuned timeouts; outages when a downstream dependency receives too much traffic; cascading failures when a single point of failure crashes; etc. You must address the most significant weaknesses proactively before they affect our customers in production. You need a way to manage the chaos inherent in these systems, take advantage of increased flexibility and velocity, and have confidence in your production deployments despite the complexity that they represent. An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. This is called Chaos Engineering. A good example of this would be the Chaos Monkey of Netflix.