As architect of the KPN API Store I gave a presentation on observability and our quest to get more insight into our platform.
A very brief overview of the slides:
Monitoring
Monitoring is for symptom-based alerting
Your monitoring system should address two questions: what’s broken, and why?
- The “what’s broken” indicates the symptom
- The “why” indicates a (possibly intermediate) cause.
Observability
“Observability”, is a superset of “monitoring”, providing certain benefits and insights that “monitoring” tools cannot deliver.
Observability is about unknown-unknowns. Empowering you to ask new questions and answer questions yet to be formulated …
Lessonslearned
- Get a platform dashboard on-screen
- Beware of the anti-pattern of monitoring everything
- Understand the social and financial implications
- From unknown-unknowns to known-unknowns and more questions
Sources
- Monitoring and Observability
- Distributed Systems Observability
- Observability at Twitter: Part 1 & Part 2
- SRE fundamentals: SLIs, SLAs and SLOs
- SRE vs. DevOps: competing standards or close friends?
- Site Reliability Engineering & The Site Reliability Workbook (free online books)
- How To Apply Google’s Site Reliability Engineering Approach To Your Infrastructure
- An Introduction to Metrics, Monitoring, and Alerting
- Observability: A Manifesto
- Three Pillars with Zero Answers: Rethinking Observability
- Observability?! – Where do we go from here?
- Lessons from Building Observability Tools at Netflix
- OpenAPM Landscape
- Paessler
References
- KPN API Store
- Slides on SpeakerDeck