Revisiting Observability: A Deep Dive Into the State of Monitoring, Costs, and Data Ownership

Originally posted on DZone

Hey internet humans! I’ve recently re-entered the world of observability and monitoring after a short detour in the Internal Developer Portal space. Since my return, I have felt a strong urge to discuss the general sad state of observability in the market today.

I still have a strong memory of myself, knee-deep in Kubernetes configs, drowning in a sea of technical jargon, not clearly knowing if I’ve actually monitored everything in my stack, deploying heavy agents, and fighting with engineering managers and devs just to get their code instrumented only to find out I don’t have half the stuff I thought I did. Sound familiar? Most of us have been there.

The three pain points that are top-of-mind for me these days are:

The state of instrumentation for observability
The horrible surprise bills vendors are springing on customers and the insanely confusing pricing models that can’t even be calculated
Ownership and storage of data - data residency issues, compliance, and control

Instrumentation

The monitoring community has got a fantastic new tool at its disposal: eBPF. It’s a game-changing tech (a cheat code, if you will) that allows us to trace what’s going on in our systems without all the usual headaches. With eBPF, we can dive deep into the inner workings of applications and infrastructure, capturing data at the kernel level with minimal overhead.

I’ve had first-hand experience deploying monitoring solutions at scale during my tenure at companies like Datadog, Splunk, and CA Technologies. I’ve seen the patchwork of APM, infrastructure, logs, OpenTelemetry, custom instrumentation, and open-source solutions that is often patched together (usually poorly) just to try and get at the basics.

At this point, there are two things that happen:

Not everything is monitored because we have no idea where everything is. We end up with far less than 100% coverage.
We start having those cringe-worthy discussions on “should we monitor this thing” due to the sheer cost of monitoring, often costing more than the infrastructure our applications and microservices are running on.

OpenTelemetry is fantastic for solving vendor lock-in and has a much larger community working on it, but it takes A LOT OF WORK. It takes real collaboration between all teams to make sure everyone is instrumenting manually and that every single library is well supported. From my observations, this generally results in an incomplete patchwork giving us a very incomplete picture 95% of the time.

With proper eBPF deployment and some secret sauce, these core concerns simply don’t have to worry us—as long as there’s a simplified pricing model in place. We can get full-on 360-degree visibility in our environments with tracing, metrics, and logs without the hassle.

The Elephant in the Room: Cost and the Awful State of Pricing in the Observability Market Today

If I’d have a penny for every time I’ve heard: “I need an observability tool to monitor the cost of my observability tool.”

Traditional monitoring tools often come with hefty price tags attached, and often ones that’s a big fat surprise when we add a metric or a log line—especially when it’s at scale! These tools typically charge based on volume of data ingested, and it’s easy to underestimate how quickly those costs add up.

I’ve seen customers receive multiple tens of thousands of dollars (sometimes multiple hundreds of thousands) in overage bills because some developer added a few extra log lines or because someone needed additional cardinality in a metric. Those costs are very real for very simple mistakes, especially when often there are no controls in place to keep them from happening.

That’s when a modern solution should step in to save the day. By offering transparent pricing based on usage—not volume, ingest, egress, or some unknown metric you have no idea how to calculate—we should be able to get specific about the cost of monitoring and set clear expectations knowing we can see everything end-to-end without sacrificing because the cost may be too high.

Ownership and Storage of Data

The next topic I’d like to touch upon is the importance of data residency, compliance, and security in the realm of observability solutions. In today’s business landscape, maintaining control over where and how data is stored and accessed is crucial. Various regulations, such as GDPR, require organizations to adhere to strict guidelines regarding data storage and privacy.

Traditional cloud-based observability solutions may present challenges in meeting these compliance requirements, as they often store data on third-party servers dispersed across different regions.

Opting for an observability solution that allows for on-premises data storage addresses these concerns effectively. By keeping monitoring data within the organization’s data center, businesses gain greater control over its security and compliance. This approach minimizes the risk of unauthorized access or data breaches, thereby enhancing data security and simplifying compliance efforts.

For organizations seeking to ensure compliance, enhance data security, and optimize costs, an observability solution that facilitates on-premises data storage offers a compelling solution. By maintaining control over data residency and security while achieving cost efficiencies, businesses can focus on their core competencies and revenue-generating activities with confidence.