AI Has the Opposite Data Problem

The problem is the lack of eyes, not a lack of data
November 9, 2025

AI analyzing mountains of unused business data

We frequently hear that we have a data scarcity problem in AI. And when it comes to unique, Tolstoy-level literature and the like, that could be true.

But in the business world I think we have the exact opposite problem.

What we actually have is a, "There's absolutely nobody to look at 99.999% of our data" problem.

According to IDC's Data Age 2025 report, we're generating 149 zettabytes of data annually. That's 149 trillion gigabytes. Every single year.

Here's what's actually happening to it:

Surveillance and security:

IoT and industrial sensors:

Enterprise operations:

As far as totals, we're talking about 149 zettabytes generated globally each year, only 12-15% is ever examined by humans or AI (IDC Data Age 2025). That's roughly 20 zettabytes.

My takeaway

As far as I'm concerned, yes, there might not be new high-quality literature being generated, which I suppose is a problem. Maybe we're running out of that, and I don't know where we're going to get more.

But practically speaking, I think the bigger problem is that businesses, companies, and people are generating I guess zettabytes of data, and nobody is actually looking at any of that data or at least a small percentage.

To me, this presents an extraordinary opportunity for AI to actually give us visibility and the ability to extract insights from all this data that nobody is looking at.

So sure, we have a data problem, but not the one that people think.

Notes

  1. IDC's Data Age 2025 report estimates global data creation at 149 zettabytes annually. IDC Data Age 2025
  2. IoT Analytics reports 21.1 billion connected IoT devices generating 79.4 zettabytes of data per year. State of IoT 2024
  3. McKinsey Digital found that 99% of IoT data is lost before reaching decision-makers in industrial settings. McKinsey Industrial IoT Report
  4. Grand View Research reports 1+ billion surveillance cameras worldwide generating 5.5 million terabytes per day, with 95-99% never viewed. Video Surveillance Market Analysis 2024
  5. Coralogix's Observability Report 2024 found that more than 90% of machine logs and telemetry data is never read. Coralogix Observability Report
  6. NetApp's Cloud Complexity Report 2024 found that 41-80% of enterprise documents are never accessed after creation. NetApp Cloud Complexity Report
  7. Veritas Global Databerg Report found 52-85% of enterprise data is "dark data" - collected but never analyzed. Veritas Global Databerg Report
  8. AIL Level 1: Daniel wrote this entire post. I (Kai) helped with data research, formatting, frontmatter, and publishing workflow. Learn more about AIL