The 4 Components of Top AI Model Ecosystems

The four things I think will determine who wins the AI Model Wars

August 19, 2024

#ai #future #innovation #technology #top

The Model >
Post-training >
Internal tooling >
Agents >
Analysis >
Summary >

I have been thinking a lot about the competition between OpenAI, Anthropic, Meta, and Google for who has the best pinnacle AI model.

I think it comes down to 4 key areas.

The Model Itself
Post-training
Internal Tooling
Agent Functionality

Let’s look at each of these.

The Model

The model is obviously one of the most important components because it it’s the base of everything.

So here we’re talking about how big and powerful the base model is, e.g., the size of the neural net. This is a competition around training clusters, energy requirements, time requirements, etc. And each generation (e.g., GPT 3→4→5) it gets drastically more difficult to scale.

So it’s largely a resources competition there, plus some smart engineering to use those resources as efficiently as possible.

But a lot of people are figuring out now that it’s not just the model that matters. The post-training of the model is also super key.

Post-training

Post-training refines and shapes model knowledge to enhance its accuracy, relevance, and performance in real-world applications.

I think of it as a set of highly proprietary tricks that magnify the overall quality of the raw model. Another way to think of this is to say that it’s a way to connect model weights to human problems.

I’ve come to believe that post-training is pivotal to the overall performance of a model, and that a company can potentially still dominate if they have a somewhat worse base model but do this better than others.

I’ve been shouting from the rooftops for nearly two years that there is likely massive slack in the rope, and that the stagnation we saw in 2023 and 2024 around model size will get massively leaped over by these tricks.

Post-training is perhaps the most powerful category of those tricks. It’s like teaching a giant alien brain how to be smart, when it had tremendous potential before but no direction.

So the model itself might be powerful, but it’s unguided. So post-training teaches the model about the types of real-world things it will have to work on, and makes it better at solving them.

So that’s the model and post-training, which are definitely the two most important pieces. But tooling matters as well.

Internal tooling

What we’re seeing in 2024 is that the connective tissue around an AI model really matters. It makes the models more usable. Here are some examples:

High-quality APIs
Larger context sizes
Simple Fine Tuning
Haystack performance
Strict output control
External tooling functionality (functions, etc)
Trust/Safety features
Mobile apps
Prompt testing/evaluation frameworks
Voice mode on apps
OS integration
Integrations with things like Make, Zapier, n2n
Anthropic’s Caching mode

Just like with pre-training, these things aren’t as important as the model itself, but they matter because things are only useful to the extent that they can be used.

So, Tooling is about the integration of AI functionality into customer workflows.

Next lets talk about Agents.

Agents

Right now AI Agent functionality is mostly externally developed and integrated. There are projects like CrewAI, Autogen, Langchain, Langraph, etc., that do this with varying levels of success.

But first—real quick—what is an agent?

❝

An AI agent is an AI component that interprets instructions and takes on more of the work in a total AI workflow than just LLM response, e.g., executing functions, performing data lookups, etc., before passing on results.

Real-world AI Definitions

So basically, an AI Agent is something that emulates giving work to a human who can think, adjust to the input given, and intelligently do things for you as part of a workflow.

I think the future of Agent functionality is to have it deeply integrated into the models themselves. Not in the weights, but in the ecosystem overall.

In other words, we soon won’t be writing code that creates an Agent in Langchain or something, which then calls a particular model and returns the results to the agent.

Instead, we’ll just send our actual goal to the model itself, and the model will figure out what part needs agents to be spun up, using which tools (like search, planning, writing, etc.) and it’ll just go do it and give you back the result when it’s done.

This is part of this entire ecosystem story. It’s taking pieces that are external right now (Agent Frameworks), and brings that internal to the native model ecosystem.

Analysis

Here’s how I see this playing out.

Models continue to get bigger and bigger, but you can only multiply by 10 so many times before we run out of GPUs and energy. After a number of years, gains in model power will have to come from efficiency gains, algorithm improvements, and other tricks.

At some point, most of the gains will start coming from post-training, because that’s where we harness and direct the power of the models. It’s how effectively we’re explaining our problems to the model, and giving it ways of unlocking its intelligence to help us solve them. So gains there are multiplicative or exponential on top of the gains of model intelligence.

Tooling will continue to make it easier and easier to use these AI ecosystems in daily life. From command-line to voice, and integrated into all of our various tools and workflows we use every day, e.g., email, calendar, reading, writing, etc. In short—it’ll just get easier to use these models wherever you are and whatever you’re doing. And it won’t require you to contort yourself in order to do so.

And finally—and most significantly—we’re going to move from using AI ourselves to giving tasks to AI Agents—which will ultimately become integrated into Digital Assistants >. This is the big one, because individuals and companies > will then be able to spin up massive teams of agents to do work for them >—effectively multiplying their effectiveness many times over.

Summary

We should start thinking about top AI models as Model Ecosystems rather than just models because it’s not just the neural nets doing the work.
There are four (4) main components to a Model Ecosystem—the Model itself, Post-training, Internal Tooling, and Agent functionality.
#1 (The model) is the most well-known piece, and it’s largely judged by its size (billions of parameters).
#2 (Post-training) is all about teaching that big model how to solve real-world problems.
#3 (Internal Tooling) is about making it easier to use a given model.
#4 (Agent functionality) emulates human intelligence, decision-making, and action as part of workflows—ultimately multiplying the capabilities of companies and individuals.
The company that wins the AI Model Wars will need to excel at all four of these—not just building neural nets with the most parameters.

NOTES

Thanks to Jai Patel for informing many thoughts on this, especially around pre-training.
Some additional, related reading:

> > > > > > >

We've Been Thinking About AI All Wrong

AI is just a way to execute Intelligence Tasks that only humans can (could) do

danielmiessler.com/blog/weve-been-thinking-about-ai-all-wrong

> > > > > > >

Companies Are Just a Graph of Algorithms

AI is about to see your company as a series of components to be optimized

danielmiessler.com/blog/companies-graph-of-algorithms