Observability: Monitoring Software Systems

Table of Contents

# Observability: Monitoring Software Systems

3 AM. Your on-call engineer is staring at a blank dashboard while your company's payment processing is failing. The monitoring alerts are screaming, but they only tell you that "API latency is high." Is it the database? The cache layer? A dependency? The CPU? Nobody knows. By the time someone figures it out—usually by adding logging to production—you've lost $50K in transactions. This is what happens when you confuse monitoring with observability.

Most teams don't understand this distinction. And frankly, it costs them.

The Myth of Metrics

I spent five years as an SRE at a fintech startup in Ho Chi Minh City, and I watched us make every beginner mistake in the book. We had 127 Prometheus metrics. Beautiful dashboards in Grafana. Alerts for everything. Yet somehow, when something broke, we were essentially blind.

The problem? We were monitoring—checking predefined conditions we thought might matter. We weren't building observability—the ability to ask arbitrary questions about our systems without having written the code to answer them first.

Here's what I mean: with monitoring, you ask, "Is this metric above threshold X?" With observability, you ask, "Why are 0.3% of our payments stuck in a pending state?"

The distinction matters because you can't predict everything that will go wrong. Your system is too complex. The interactions are too numerous. Emergent behavior happens. That's why observability is fundamentally about giving you the raw signal richness to investigate anything.

Three Pillars, One Philosophy

People will tell you observability has three pillars: metrics, logs, and traces. That's true but incomplete. The real answer is: metrics answer 'how much,' logs answer 'what happened,' and traces answer 'in what sequence.'"

But here's what nobody talks about: they're useless in isolation.

I've seen teams with world-class logging that couldn't correlate log events across a distributed system because they didn't instrument their applications with trace IDs. I've seen impeccable metrics that missed entire categories of failures because nobody was recording the right dimensions. I've seen detailed traces that were too expensive to sample broadly, so when rare bugs occurred, they went untraced.

Share this post

Observability: Monitoring Software Systems

The Myth of Metrics

Three Pillars, One Philosophy

Related Posts

Need technology consulting?

The Economics Nobody Mentions

The Vietnam Market Peculiarity

The Practical Reality

What Expert Practitioners Actually Do

The Uncomfortable Truth

Monorepo vs Polyrepo: Repository Management

Code Review: Building an Effective Review Culture