Using Reflections to Compress LLM Context Data

One way to turn gigabytes of data into a few Kilobytes of LLM context

As I’ve discussed in previous posts, traditional software is about to be replaced by LLM-based software using SPQA. In short, rather than having traditional databases and traditional queries, we’ll just have LLMs that consume context about entities.

If it’s HR software, it’ll be people and employment context. If it’s security software, it’ll be context about systems and people and identities and permissions. And if it’s sales software, it’ll be context around prospects and opportunities.

LLMs thrive on context, but they are easily overwhelmed by too much prompt and embedding input.

One challenge companies will face along their SPQA journey, however, is how to go from giga/tera/peta-bytes of data into something that can be consumed by LLMs. You can’t simply pipe your entire AWS state, or endpoint logs, into an LLM. It’ll fall over. And that’s assuming you can afford the LLM processing time.

Reflections

Stanford put out a wonderful paper about what they call Generative Agents, which exist in an open virtual world, have their own personalities, and that grow based on free interactions with other agents and the environment.

The whole paper was spectacular, but the part that really struck me was the Memory and Reflections bit. Note: check out a replay of part of the simulation.

Complex behaviors can be guided by agents’ recursive synthesis of recordings into higher-level observations. The agent’s memory stream is a database that contains a complete account of the agent’s prior experiences. To adapt to its shifting surroundings, the agent can access relevant data from its memory stream, process this knowledge, and formulate an action plan.

This is extraordinary. Let me try to translate, or at least give my interpretation.

Lots of things are happening to a given agent. They see things. They have interactions. Events take place around them. That’s one bucket, which is basically observations.

Then there is another bucket which is Reflections. These are periodic reviews of those observations that culiminate in a thought, or a, um…reflection…about what was seen or experienced.

Let’s say a neighbor has a dog that keeps pooping on your yard. And they play loud music past 10pm on a regular basis. But the one time you tried to have a party in your backyard, during the day on a Saturday, they called the cops.

You’d have observations:

  1. Their dog pooped on my lawn the first time

  2. Their dog pooped on my lawn the second time sometime later

  3. Then a third and fourth times

  4. Then there’s the first time my kid got woken up on a school night from music being too loud

  5. And the second and third times

  6. And then they called the cops on us three weeks later

These can all be turned into a Reflection.

My neighbor is an asshole.

That might not seem useful, but humans do something similar. As Kahneman and others have talked about extensively, humans often use shortcuts when remembering things. When you think about whether you like your neighbor, you don’t recall every incident—good and bad—from the last 14 years they lived next to you. Instead you use an emotional heuristic that gives you kind of a thumbs-up or thumbs-down based on all the interactions.

We often can’t remember what happened, but we can remember what we thought about it and how it made us feel.

Not being an expert on human memory it feels a lot like a compression mechanism that saves space and processing power. Is this dangerous? Should I do business with them? Etc. You often don’t have time in those moments to rehash the entire history. You need a heuristic.

Applying Reflections to real-word applications

Ok, cool. So what’s the analog to LLMs, and LLM-based software?

Easy: Event compression. Log compression. Data compression. SPQA State compression. Like going from a massive time series to extracting the meaning from it and sending that to an LLM.

We can’t send the state of AWS to an LLM. It’s a maze. And it’s changing constantly. We can’t monitor OSQuery for all hosts in the environment and send that to an LLM continuously. It’s this way with the entire business. The data are too numerous and too chatty. We need a way to compress those raw events down to something usable for LLMs.

Reflections will be a major part of that story, at least early on.

Examples

Here are some examples of this:

  • Characterizing a user:

    • Julie is a senior developer and she tends to work between these hours

    • She mostly interacts with these systems

    • During performance review season she also uses these systems

  • Characterizing a system:

    • This system contains mostly schematics for the FooProduct

    • It’s usually accessed by these types of users, during these hours

    • It gets backed up to here, using this method

    • It has these ports open

    • It uses this authentication

    • It has the following shared resources available

    • They tend to be used this amount

  • Characterizing a threat actor:

    • This entity tends to probe us from the following set of IPs

    • They tend to use these types of probes

    • This behavior is associated with the following known actors

    • Their nexus appears to be X, but possibly Y

    • We should look for the following TTPs and see if they have things in common with this actor’s behavior

  • Characterizing activity

    • We’re seeing a burst in outbound traffic to addresses with no prior traffic

    • We’re seeing more people using GMail drafts that have attachments

    • We’re seeing more DNS traffic with large payloads

    • Lots of people are currently using and talking about LinkedIn, which is happening right after the bad news about the stock price

  • Characterizing a market

    • Our competitors seem to be pivoting to small microservices vs. large product launches, and that’s catching on

    • Perhaps we should release some of our internal tooling as a microservice that customers can use?

  • Characterizing a culture

    • Employees seem to be taking advantage of the open leave policy in the Portland office

    • We should let managers there know to slightly tighten their tolerances on what gets approved vs. not

    • We should watch for abuses of policies that benefit the whole that end up costing the company money and lost trust with employees

  • Characterizing a trend

    • We’re seeing a trend of less-developed documentation prior to releases, and PRDs that are less vetted with the broader team

    • This seems associated with more customer complaints about quality issues, and more rework being done within Github

    • We should consider raising the bar for PRD quality and reviews

These are just a few hasty examples. The point is that computers are good at looking at lots of events and doing things with them.

Imagine a multi-phased process where you go from gigabytes or terrabytes of data, into various legacy-tech consolidation/compression processes that prune duplicates and look for needles.

Think of reflections as LLMs looking down at everything and saying to themselves, “Hmm, that’s interesting”…and then writing it down somewhere for us.

Those can then be put into a second/third process that classifies them using more classical ML. And perhaps the final step takes a much smaller number of highly-refined events and gets them to the LLM for analysis.

Like most compression, there is loss here. We can lose the original events in the process in a way that makes it difficult to go backward and do attribution. So that’s something to think about. There will be use cases that are horrible for Reflections, but I think there will be many more where they’re incredibly useful.

How do you know what to keep and what to discard?

Well, that’s the question, isn’t it?

Expect hundreds of companies to spring up to work on this problem. Companies are in the world of terrabytes and petabytes, and LLMs are in the world of kilobytes and megabytes.

In order to make full use of LLM-based software, that gap needs to be closed.

Summary

Eventually this will turn into permanent pipelines flowing into continuous custom model training (and fine-tuning). And Reflections will be somewhat less useful when you don’t need that compression.

But that’ll be a while. Continuous training of company-scale custom models will be cost-prohibitive for most companies for many years, requiring us to continue to rely on large-context prompting, vector embeddings, and Reflections.

At least that’s how I’m seeing it. Let me know if I’ve missed something.