2020 DistributedTracinginPracticeIns
- (Parker et al., 2020) ⇒ A. Parker, D. Spoonhower, J. Mace, B. Sigelman, and R. Isaacs. (2020). “Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices.” O'Reilly Media. ISBN:9781492056607
Subject Headings: Software System Tracing, Distributed Application Tracing.
Notes
Cited By
2021
Quotes
Book Overview
https://www.oreilly.com/library/view/distributed-tracing-in/9781492056621/
Since most applications today are distributed in some fashion, monitoring their health and performance requires a new approach. Enter distributed tracing, a method of profiling and monitoring distributed applications — particularly those that use microservice architectures. There’s just one problem: distributed tracing can be hard. But it doesn’t have to be.
With this guide, you’ll learn what distributed tracing is and how to use it to understand the performance and operation of your software. Key players at LightStep and other organizations walk you through instrumenting your code for tracing, collecting the data that your instrumentation produces, and turning it into useful operational insights. If you want to implement distributed tracing, this book tells you what you need to know.
You’ll learn:
- The pieces of a distributed tracing deployment: instrumentation, data collection, and analysis
- Best practices for instrumentation: methods for generating trace data from your services
- How to deal with (or avoid) overhead using sampling and other techniques
- How to use distributed tracing to improve baseline performance and to mitigate regressions quickly
- Where distributed tracing is headed in the future
Table of Contents
0. Introduction: What Is Distributed Tracing? Distributed Architectures and You Deep Systems The Difficulties of Understanding Distributed Architectures How Does Distributed Tracing Help? Distributed Tracing and You Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments
1. The Problem with Distributed Tracing The Pieces of a Distributed Tracing Deployment Distributed Tracing, Microservices, Serverless, Oh My! The Benefits of Tracing Setting the Table
2. An Ontology of Instrumentation White Box Versus Black Box Application Versus System Agents Versus Libraries Propagating Context Interprocess Propagation Intraprocess Propagation The Shape of Distributed Tracing Tracing-Friendly Microservices and Serverless Tracing in a Monolith Tracing in Web and Mobile Clients
3. Open Source Instrumentation: Interfaces, Libraries, and Frameworks The Importance of Abstract Instrumentation OpenTelemetry OpenTracing and OpenCensus OpenTracing OpenCensus Other Notable Formats and Projects X-Ray Zipkin Interoperability and Migration Strategies Why Use Open Source Instrumentation? Interoperability Portability Ecosystem and Implicit Visibility
4. Best Practices for Instrumentation Tracing by Example Installing the Sample Application Adding Basic Distributed Tracing Custom Instrumentation Where to Start—Nodes and Edges Framework Instrumentation Service Mesh Instrumentation Creating Your Service Graph What’s in a Span? Effective Naming Effective Tagging Effective Logging Understanding Performance Considerations Trace-Driven Development Developing with Traces Testing with Traces Creating an Instrumentation Plan Making the Case for Instrumentation Instrumentation Quality Checklist Knowing When to Stop Instrumenting Smart and Sustainable Instrumentation Growth
5. Deploying Tracing Organizational Adoption Start Close to Your Users Start Centrally: Load Balancers and Gateways Leverage Infrastructure: RPC Frameworks and Service Meshes Make Adoption Repeatable Tracer Architecture In-Process Libraries Sidecars and Agents Collectors Centralized Storage and Analysis Incremental Deployment Data Provenance, Security, and Federation Frontend Service Telemetry Server-Side Telemetry for Managed Services
6. Overhead, Costs, and Sampling Application Overhead Latency Throughput Infrastructure Costs Network Storage Sampling Minimum Requirements Strategies Selecting Traces Off-the-Shelf ETL Solutions
7. A New Observability Scorecard The Three Pillars Defined Metrics Logging Distributed Tracing Fatal Flaws of the Three Pillars Design Goals Assessing the Three Pillars Three Pipes (Not Pillars) Observability Goals and Activities Two Goals in Observability Two Fundamental Activities in Observability A New Scorecard The Path Ahead
8. Improving Baseline Performance Measuring Performance Percentiles Histograms Defining the Critical Path Approaches to Improving Performance Individual Traces Biased Sampling and Trace Comparison Trace Search Multimodal Analysis Aggregate Analysis Correlation Analysis
9. Restoring Baseline Performance Defining the Problem Human Factors (Avoiding) Finger-Pointing “Suppressing” the Messenger Incident Hand-off Good Postmortems Approaches to Restoring Performance Integration with Alerting Workflows Individual Traces Biased Sampling Real-Time Response Knowing What’s Normal Aggregate and Correlation Root Cause Analysis
10. Are We There Yet? The Past and Present Distributed Tracing: A History of Pragmatism Request-Based Systems Response Time Matters Request-Oriented Information Notable Work Pinpoint Magpie X-Trace Dapper Where to Next?
11. Beyond Individual Requests The Value of Traces in Aggregate Example 1: Is Network Congestion Affecting My Application? Example 2: What Services Are Required to Serve an API Endpoint? Organizing the Data A Strawperson Solution What About the Trade-offs? Sampling for Aggregate Analysis The Processing Pipeline Incorporating Heterogeneous Data Custom Functions Joining with Other Data Sources Recap and Case Study The Value of Traces in Aggregate Organizing the Data Sampling for Aggregate Analysis The Processing Pipeline Incorporating Heterogeneous Data
12. Beyond Spans Why Spans Have Prevailed Visibility Pragmatism Portability Compatibility Flexibility Why Spans Aren’t Enough Graphs, Not Trees Inter-Request Dependencies Decoupled Dependencies Distributed Dataflow Machine Learning Low-Level Performance Metrics New Abstractions Seeing Causality
13. Beyond Distributed Tracing Limitations of Distributed Tracing Challenge 1: Anticipating Problems Challenge 2: Completeness Versus Costs Challenge 3: Open-Ended Use Cases Other Tools Like Distributed Tracing Census A Motivating Example A Distributed Tracing Solution? Tag Propagation and Local Metric Aggregation Comparison to Distributed Tracing Pivot Tracing Dynamic Instrumentation Recurring Problems How Does It Work? Dynamic Context Comparison to Distributed Tracing Pythia Performance Regressions Design Overheads Comparison to Distributed Tracing
14. The Future of Context Propagation Cross-Cutting Tools Use Cases Distributed Tracing Cross-Component Metrics Cross-Component Resource Management Managing Data Quality Trade-offs Failure Testing of Microservices Enforcing Cross-System Consistency Request Duplication Record Lineage in Stream Processing Systems Auditing Security Policies Testing in Production Common Themes Should You Care? The Tracing Plane Is Baggage Enough? Beyond Key-Value Pairs Compiling BDL BaggageContext Merging Overheads
A. The State of Distributed Tracing Circa 2020 Open Source Tracers and Trace Analysis Commercial Tracers and Trace Analyzers Language-Specific Tracing Features Java and C# Go, Rust, and C++ Python, JavaScript, and Other Dynamic Languages
B. Context Propagation in OpenTelemetry Why a Separate Context Model? The OpenTelemetry Context Model W3C CorrelationContext and the Correlations API Distributed and Local Context Examples and Potential Applications
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2020 DistributedTracinginPracticeIns | A. Parker D. Spoonhower J. Mace B. Sigelman R. Isaacs | Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices | 2020 |