How AI is Eating the Software World

api-nebula-miessler-midjourney

Our discussions around Generative AI are focused on the wrong thing, and it’s causing us to miss what’s about to happen. We’re infatuated with what it can do, like getting better search results, creating custom art, or becoming an interactive teacher. It’s all really exciting, but it’s distracting us from the advancement that makes it all possible.

That advancement is understanding. Generative AI is an unfortunate misnomer. Yes, it’s generating things, but the name ignores the critical prerequisite of needing to understand before it can create.

I don’t use the word understand lightly, it’s the most important word in this essay. But what does it actually mean for software to understand something? Here’s how I think about the progression from data to understanding, and this is the definition I’m using to back my claim that large GPT models actually understand.

  • Data: Raw facts or figures without context

  • Information: Data that has been processed, organized, or structured

  • Knowledge: Information that has been analyzed and integrated

  • Understanding: The ability to apply knowledge in new situations and contexts

Or:

  • Data is raw

  • Information is organized

  • Knowledge is integrated

  • Understanding is applied

People are defining understanding as special because it’s human, which is begging the question. You have to start with a definition and see if it applies.

So do Generative Pre-trained Transformers really understand things? There are endless pundits out there saying no:

GPTs are nothing but next character predictors.

Or:

They’re just filling in the next letter of text based on what they’ve been trained on! It doesn’t really understand anything.

They’re half-right. The models are just deciding the next best character to write based on their training, but so is Prince when he does an off-the-cuff guitar solo. If you watch it in slow motion he’s just going from one note, to the next, in order. But the only reason he can do it is because of all the music he’s listened to in his life, and all the hours he’s practiced. He can do it because he understands guitar.

GPTs are similar. When they are to told to:

Write a romantic poem from Luke to Leia in the style of Shakespeare.

…they’re also “just completing the next character” in a piece of text. But look what they can do.

O Leia, fairest of all maidens fair,

Thy beauty doth my very soul ensnare.

Thy shining eyes, like stars that light the night,

Doth fill my heart with love’s sweet, shining light.

 

Thy gentle touch, like feathers on my skin,

Doth make my heart to dance and leap within.

Thy voice, like music to my eager ears,

Doth soothe my soul and calm my deepest fears.

 

And though our love may be forbidden, true,

I cannot help but give my heart to you.

For thou art all I ever could desire,

My Leia, thou hast set my heart on fire.

 

So let us steal away into the night,

And let our love take wing and take to flight.

For in thy arms, I know I shall find peace,

And all the world’s troubles shall surely cease.

 Generated by ChatGPT

Generated by ChatGPT

This was made one letter at a time, but in order to do so it had to first learn and understand the following non-trivial subjects:

  1. The English language

  2. How poets write

  3. How Shakespeare writes

  4. Star Wars

  5. The fact that Luke and Leia are siblings

  6. The concept of forbidden love

Looking our definition—“the ability to apply knowledge in new situations and contexts”, that’s precisely what just happened. You simply cannot write such a poem (much less an infinite number of them) without understanding the underlying concepts. And it’s the same for any sufficiently trained GPT. The point isn’t that they generate. The point is that they understand.

Software, before and after

So what about software? What is software’s purpose? What does it actually do, from a first-principles perspective? And why is our current software in imminent danger from this new type of understanding AI? I’d say software is something like, “providing an interface to information and action in order to create understanding and pursue results.” So:

  • Interfaces to information: the ability to store information in organized structures that we can query

  • Interfaces to action: connecting inputs and outputs to action, such as sending emails

  • Creating understanding: when we get the information back it creates knowledge and understanding inside human brains

  • Pursue results: we configure the software such that its execution moves us closer to higher-level outcomes we’re trying to achieve

Generative AI is eating our existing software not because it does some things better than legacy software, but because it’s working at a completely different layer of the stack. Rather than working with information and action, AI deals in understanding and outcomes. Those are the two maturity models: The Understanding axis—which goes from data to understanding, and the outcomes axis—which goes from task to outcome.

I recommend all of Stephen Few’s books on metrics, reporting, and dashboards. They’re the best out there.

At Apple I had a team focused on creating security insights for the company. The goal I gave the team was to move up the intelligence stack over time. Meaning the better we got, the more we’d move from providing data, to providing information, then to knowledge, and finally to “intelligence” that was ready for a decision. So it was a data “enrichment” process, arriving at the form that was as close as possible to action.

This has, until now, been an uniquely human capability. On the Understanding axis, going from satellite images, intercepted phone communications, and log files—to an analysis and recommendation for a general—has been unreachable for computers because it required too much background knowledge of the world. Actually, too much understanding of the world.

It’s the same for doing something complex in software, like finding the best possible thing to say to a customer to get them to buy something. The software could help the human by providing as much supporting information as possible in Salesforce, but ultimately it came down to the human knowing what exactly to say in the email, or where exactly to recommend for a perfect dinner.

Motion on the Outcomes axis is similar, and is currently mostly handled by humans. We can easily ask software today to email a list of people that’s provided, or to do a series of tasks such as, 1) query the database for those who currently work in Boise, and then 2) email those customers with the Boise_Invite template. But a human had to come up with those steps, and then program a computer to execute them.

Everything unfolds from understanding.

Moving from basic task execution to driving towards an Outcome has always required a human. You first have to understand the desired outcome, and then you have to understand the entire world well enough to break an outcome into discrete tasks. Finally, you have to know which tasks must be done first, which are dependencies on others, etc. Definitely human territory.

This is why AI is about to eat our legacy software. GPTs just jumped to the top of both the Understanding and Outcomes axes. And for many domains, the moment they got as good as humans is the same moment that they surpassed them.

A new software architecture

I highly recommend Andrej Karpathy’s essay, Software 2.0 on how software will become neural net weights.

Using this model of Understanding and Outcomes, there’s a clear path to AI being vastly better than traditional software at doing most software-related tasks. This is because you get better at both Understanding and Outcomes the more you know about the world as a whole. This is true whether you’re selling something to someone, and therefore must perfectly understand both them as a person, plus the right words to use, or whether you’re trying to perfectly understand your company’s goals and OKRs for the year—and the actions within your company that are needed to get there.

Either way, the more you understand about people, and startups, and enterprises, and sales, and Jira, and email, and meetings, and Slack, etc.—the better you can both create and execute plans to achieve your desired outcomes. And that understanding is precisely the thing that GPTs are so good at.

This suggests a new way of thinking about software post-GPT, which I break into three main pillars.

  1. STATE, which is the current state of the universe, i.e, the data and telemetry that best represents it.

  2. POLICY, which is your desired state and the set of things you want and don’t want to happen, and

  3. ACTION, which is the recommendations or actions that can be performed to bring the STATE in line with the POLICY. And all of this sits on top of one or more large models similar to GPT-N.

The game then becomes getting as high quality telemetry as possible about the current state of the thing you care about, whether that’s a business, or a person’s health, or the functioning of a city. You then define exactly what you want for that thing, like we want this startup to grow at 50% or more each year, and we want to keep costs below X number of dollars, or we want our city’s population to have contentment scores above 85%. And finally, we have recommended (and soon automated) actions that flow out of the combination of those two things (plus the underlying GPT-N models).

There will obviously be a Legion of security and privacy concerns around ingesting so much raw data to train these models, and they’ll need to be addressed.

Let’s take an email security startup as an example, the STATE is everything happening in the company. Every piece of documentation. Every Slack message. Every email. Every text conversation. Every meeting. Every voice conversation. Their financials. Their bank balances. Their equity structure. Every pertinent external reaction by the public, by the market, etc. Basically everything. As much as possible.

The POLICY is what that company is trying to do. And not do. It’s like goals and anti-goals. How fast they want to grow. How many employees they want to have. Whether they want to work remote or in offices, or some combination thereof. Their values. Their culture. Their short-term milestones. Their OKRs. Their KPIs. Example:

> We are a remote-first culture. We want to have 2% email security market share by 2027. We will never do business with companies based out of Mississippi. We have unlimited PTO and a 4-day work week.

Although it won’t be long before POLICY is also being recommended and adopted.

The company’s RECOMMENDATION/ACTION components are not explicitly entered. They are generated by the various models working on the combination of the company’s STATE and POLICY. And remember, this could be a startup like above, but it could also be a school project, a plan to get ready for a half marathon, or a way to rejuvenate the state of Wyoming.
 Then, being at the very top of the Outcomes axis, the models will create detailed action plans for achieving everything in the policy. And once the models are connected to all the functional systems, such as email, calendar, docs, etc, it’ll actually perform those actions.

Some examples:

  • Writing the company strategy

  • Writing the OKRs

  • Tracking the OKRs based on what people actually do and accomplish

  • Writing the Quarterly Business Review (it’s updated daily so it’s always ready)

  • Writing the presentation for the board

  • Writing the latest investor deck

  • They’ll create the meeting agendas

  • They’ll create the Slack channels

  • They’ll invite all the right people

  • They’ll consume everything that happened from the day before, and condense that into 5 quick bullet points for all leaders at 9am the following day

  • It’ll write the job descriptions for new hires

  • It’ll send the emails in the perfect tone to prospective candidates

  • It’ll filter the responses

  • It’ll set up the meetings to do the in-person interviews

  • It’ll build the perfect interview challenges based on what the company currently needs most

  • It’ll collect feedback from the hiring committee

  • It’ll perform a full analysis of all the feedback and make a hiring recommendation

  • It’ll onboard them

  • It’ll send the swag pack to their house

  • And I’ll answer all their onboarding questions during the first two weeks, and forever

In this AI-based software stack of STATE, POLICY, and ACTION, the models run the show. They do so because they have far more understanding of all the relevant pieces—and their billions of interaction points—than any human possibly could. The main limitations of such a system are 1) the quality and frequency of the data coming in that produces the STATE, and 2) the opinionated culture, guidance, and decisions present within the POLICY.

I/O flexibility

Current software has a critical limitation that we no longer notice due to familiarity. If you want to add a new piece of functionality, you have to add new inputs, possibly add new data, and then create new outputs. For example, in our email startup, if we want to allow customers to submit their own filtering rules, we have to figure out how they’ll upload those rules and write an interface for that. Then we have to parse what they send us. Then we have to translate that into our own internal rule language. And when something triggers with that new functionality, we might need another interface or workflow to do something with it.

So our current software has a very finite number of inputs and outputs, which must be strictly maintained. Same with the data stores themselves. They’re the information behind those inputs and outputs, and if their formal and pedantic structures aren’t continuously and meticulously manicured, the whole thing stops working. This is severely limiting. And fragile.

AI software using something like an SPA architecture doesn’t have that problem. There you basically have this entity that understands your business, period. It’s linked everything together. It knows your company’s goals. It knows your finances. It knows who’s doing what. It knows what cybersecurity threats are. It knows how Splunk logs affect those cybersecurity threats. It knows what your most critical applications are. And in fact it can make you a list of them, just by looking at what’s in your STATE and POLICY. It fully understands nearly every aspect of your business.

Most importantly, the interface to its understanding is infinite. It’s just questions. If you have 100 questions, you get 100 answers. If you ask ten times that many you get ten times the results. And you didn’t have to write new software to ask different types of questions for finance, or project management, or security. To the models, it’s all just understanding. You also didn’t have to update the backend data. That’s all handled at the STATE layer by continuous ingestion and retraining. Your interfaces in and out of your company’s brain are now infinite, in the form of human language.

To get a feel for how powerful this is, imagine you’re a security org within a medium-sized company and you have a functioning SPA architecture. You’ll be able to give a command like this:

> Give me a list of our most critical applications from a business and risk standpoint, create a prioritized list of our top threats to them, and correlate that with what our security team is spending its time and money on. Make recommendations for how to adjust our budget, headcount, OKRs, and project list to properly align to our actual threats. Then write up an adjusted security strategy using this new approach. Define the top 5 KPIs we’ll track to show progress towards our goals. Build out the nested OKR structure that flows from that strategy given our organizational structure. Create an updated presentation for the board describing the new approach. Create a list of ways we’re lacking from a compliance standpoint given the regulations we fall under. Then create a full implementation plan broken out by the next four quarters. Finally, write our first Quarterly Security Report, and keep that document updated every time the model re-runs.”

The models are already created and available, so that entire plan will take a few seconds to output. Oh, and to do it again you just ask it again.

The future of software is asking smart questions to a mesh of APIs running layered models in something like an STATE, POLICY, ACTION (SPA) architecture.

What to expect

So how is this going to play out?

The seminal work on GPTs was a paper called Attention is All You Need, by Ashish Vaswani, et al, in 2017

Keeping in mind that ChatGPT just came out a few months ago, there’s currently a major obstacle to everyone jumping to an SPA architecture next week. The tech for training large, custom models is still nascent and expensive. Lots of things make custom models challenging, but the biggest ones are:

  1. Context/Prompt limits, i.e., how much data you can get into a new, company-specific model that sits on top of a massive LLM like GPT-N

  2. How many different types of data you can include (text, images, voice, video, etc.)

  3. How long it’ll take to train on each cycle

  4. How much it’ll cost each time you train

Technologies like LangChain for adding high quantities of custom context are super interesting, and building these custom models as large, fast, and as cheap as possible will be one of the fastest-moving subfields in the space.

As an example of such an API chain, I have one that pulls a security incident article webpage via command line, extracts the relevant article from it, classifies the incident in multiple ways, and outputs a JSON.

My other major thought on this progression is that companies are about to become predominantly APIs. Both internally and externally. Internal functions become APIs, and external offerings become APIs. All backed by the company’s SPA stack. The APIs will work like UNIX pipes, where you can take the output from one and pipe it into another. Interfaces are critical, but they’re not the important IP for a company, and I think they’ll start decoupling from the core functionality offered by companies. There will be companies that specialize in UI/UX, and they will be the interfaces to a web of APIs provided by companies.

This AI-assistant piece is more like 3-5 years out, as opposed to the rest of this that’s happening already.

This doesn’t mean companies won’t need people. It still takes people and normal company things like meetings and conversation to make a company work. And there will be lots of new ways that pop up to differentiate one company from another when APIs are the primary interaction point. But it will be a major shift when—a bit further down the line—digital assistants are making the requests on behalf of their users. So they’re the ones querying the APIs, managing the interface that the user experiences, etc. At that point, the value of the company is the quality of results in the API.

The next significant challenge to current, day-0 GPTs is that of non-determinism. It’s fine to get ten different poems when you ask ten different times, because that’s the nature of creativity, but it’s clearly a problem if you’re managing major aspects of your company using the same tech. Finances? Legal decisions? Security questions? Analysis and decision-making in these critical areas need to be trustworthy, and that requires consistency.

This will help address the “hallucination” problem as well.

One way this will be addressed is through the chaining of multiple AI layers that perform separate functions. And you can use them together to ensure you get a consistent and high-quality result. I’m already doing this in my own work, where I call one GPT-powered API to create a thing, another to check it for X, another to check it for Y, and another to do final analysis. And I use different personas for each step, e.g., “You’re an auditor.”, “You’re a QA specialist.”, etc. Then you can add some strict, static validation steps to the mix as well. The combination of multiple AI layers, strict validation rules, plus the inevitable consistency improvement of the models will ultimately remove this obstacle. But it might take a while, and in the meantime we’ll have to be careful what we do with results.

How to get ready

This isn’t a 1-3 year thing. It’s not even a 6-month thing. This is happening right now. So what can you do to get ready? That obviously depends on who you are and what you’re doing, but here are some opinionated universal recommendations.

Start thinking about your business’s first principles. Ask yourself very seriously what you provide, how it’s different than competitor offerings, and what your company will look like when it becomes a set of APIs that aren’t accessed by customers directly. Is it your interface that makes you special? Your data? Your insights? How do these change when all your competitors have equally powerful AI?

Start thinking about your business’s moat. When all this hits fully, in the next 1-5 years, ask yourself what the difference is between you doing this, using your own custom models stacked on top of the massive LLMs, vs. someone like McKinsey walking in with The SolutionTM. It’s 2026 and they’re telling your customers that they can simply implement your business in 3-12 months by consuming your STATE and POLICY. Only they have some secret McKinsey sauce to add because they’ve seen so many customers. Does everyone end up running one of like three universal SPA frameworks?

Mind the Innovator’s Dilemma. Just because this is inevitable doesn’t mean you can drop everything and pivot. The question is—based on your current business, vertical, maturity, financial situation, etc.—how are you going to transition? Are you going to do so slowly, in place? Or do you stand up a separate division that starts fresh but takes resources from your legacy operation? Or perhaps some kind of hybrid. This is about to become a very important decision for every company out there.

Focus on the questions. When it becomes easy to give great answers, the most important thing will be the ability to ask the right questions. This new architecture will be unbelievably powerful, but you still need to define what a company is trying to do. Why do we even exist? What are our goals? Even more than your STATE, the content of your POLICY will become the most unique and identifying part of your business. It’s what you’re about, what you won’t tolerate, and your definition of success.

Summary

  1. GPT-based AI is about to completely replace our existing software

  2. GPTs work because they actually understand their subject matter

  3. Software will be rewritten using an AI-based STATE, POLICY, ACTION structure

  4. The SPA architecture will manifest as clusters of interoperable, AI-backed APIs

  5. Businesses need to start thinking now about how they’ll survive the transition

Notes

  1. Thank you to Saul Varish, Clint Gibler, Jason Haddix, and Saša Zdjelar for reading early versions of this essay and providing wonderful feedback.

  2. Check out Andrej Kaparthy’s essay titled Software 2.0. It’s brilliant. LINK

  3. The original GPT paper by Ashish Vaswani, et al: Software is All You Need. LINK

  4. I expect much of the RSA floor to be decimated by this transition. They’ll either get eaten up immediately by eager competitors or eventually replaced by the McKinsey BigModelTM types as they come out.

  5. I wrote a book back in 2016 that talks about this and even further steps. It’s not a great book, but it’s very short and does a good job capturing the essential ideas. You can get a copy here. LINK.

  6. The title is a hat tip to Marc Andreessen’s Why Software is Eating the World. LINK

  7. I’ll be writing a number of follow-ups to this piece focused around implications, specific internet companies and projects currently working on various parts of the idea, and practical examples. If you’re interested you should sign up for weekly updates here or below. LINK

  8. The essay’s top image was created by Daniel Miessler, using Midjourney 4.