I keep hearing that world models are the way forward for AI.
I tend to agree, and have been saying the same for many years as a technical person in AI but a non-A-tier-AI-researcher working on actual models.
Anyway, I'm up at 3:45AM today with an insane thought.
Why do we think humans have world models?
We tend to think humans have real world models, and LLMs have fake ones—or none at all. Importantly, the evidence we have for this is something like:
LLMs are just spewing out words or images describing what they've heard about world models, not giving their own.
But isn't that exactly what humans are doing?
Think to yourself what would happen if a ball rolls off the side of a table. But then imagine the table is tilted a few degrees in one direction. Or imagine it in zero gravity going around the Earth.
Here's what will happen.
Don't believe me? Try again. Try as many times as you want. That is what we do as humans.
How is this different from LLMs exactly?
Our brains are a bunch of neurons, right? We cut the brain open and we see those cells. We don't see magic world model cells or describe world model cells. Just neurons and their connections and such.
Just like an LLM. It's a bunch of nodes and connections.
And when we query our own system—asking how the world works—we get a flash of images and text, which we then speak in semi-random flowing sentences we don't formulate beforehand.
One.
Word.
At.
A.
Time.
Oh, and how good are those world models of ours?
Well, if you ask somebody with very little training about physics or whatever, they're going to have faulty images in their brain and faulty verbal explanations.
But if you ask somebody like Richard Feynman, who both knows the physics and is very articulate, you'll probably get a great answer.
So it's training. Again just like an LLM. Fuck.
I obviously know there are major differences between LLMs and humans.
But I'm having a hard time figuring out why we're using humans as the standard for world models when the way we articulate them seems just as "black box" as when LLMs do it.
Even more troubling, the metaphor continues.
As humans we're doing this constantly, for everything.
You have zero control of what pops into your head. And if you just start speaking without thinking you'll stream word tokens just like an AI.
The whole thing is wacky.
We're just sensations of self. Calling into a skull-mounted meat void, getting things back, spewing those things, and calling them our own.
That's our standard for free will, agency, and yeah—world models.