- Unsupervised Learning
- Posts
- UL NO. 456: A Deep-dive on Prompt Injection
UL NO. 456: A Deep-dive on Prompt Injection
$1 Million to Hack Apple AI Cloud, Feet Pics vs. Spotify, First Impressions of 18.2, System 2 Security Awareness, and more...
SECURITY | AI | PURPOSE
UNSUPERVISED LEARNING is a newsletter about upgrading to thrive in a world full of AI. It’s original ideas, analysis, mental models, frameworks, and tooling to prepare you for the world that’s coming.
TOC
Hey there!
Lily Allen says she earns more from selling feet pics on OnlyFans than from her Spotify streams. She started the account after a pedicurist's suggestion and now makes at least $10,000 monthly from 1,000 subscribers.
💡Pedicurist as Talent Scout was not on my bingo card for 2024. MORE
—
The new AI features in the 18.2 beta are insanely awesome. Check out this picture I took of a glacier by long-pressing the Siri button on my iPhone 16 Pro.
It did all that by itself, using the native camera app. I didn’t have to take the picture and send it to OpenAI!
In other words, they just fixed Siri.
Here’s the full thread where I wrote up what I like about the new AI stuff in 18.2. MORE
—
Why I think (pure) developers are seriously screwed now. The ease of building an actual app is going way, way down—and faster than even I thought it would. MORE
—
👇🏼#1 AI question I get asked is about how to do AI securely within a company.👇🏼
Sponsor
Want to adopt GenAI but need data privacy guardrails first?
Harmonic Security gives security teams visibility and control around GenAI apps.
With Harmonic, you can:
Track employee usage and adoption of GenAI
Identify Shadow AI and GenAI tools training on your data
Detect sensitive data leaving the business via GenAI apps
Coach users via inline training and nudging towards safe AI use
Learn about Harmonic’s unique approach to securing sensitive, unstructured data effectively—without compromising on efficiency.
SECURITY
Apple is offering $1,000,000 to hack its Private Cloud Compute (PCC) system, which is its new, proprietary cloud system it built to handle Apple Intelligence requests that can’t be done on-device. MORE
🧠A New Way to Think About Why Security Awareness Doesn’t Work
💡Had an absolutely brilliant conversation with Cornelia Puhze at the Swiss Cyberstorm speaker dinner. She’s an expert on security awareness and we talked about why most programs don’t work, and her premise was that the only model that will work is something that interrupts System 1 thinking and gets us a chance with System 2.
🤯
In other words, the attacks are getting so good that you’re not thinking—you’re reacting. So all the traditional training in the world won’t help you because you’re not in the mindset where training CAN work. And this only gets worse with AI-written spearphishing that’s perfectly targeted to your personality flaws.
We talked about how the only defense is something like Dialectical Behavior Therapy and similar techniques—that teach you how to PAUSE when you become excited or anxious or stressed or whatever. Which is fascinatingly and strangely related to mindfulness.
Anyway, just love this concept so much because it cleanly explains why security awareness training fails so spectacularly, and hints at a new way of training that could work. Go follow Cornelia’s work.
—
💉Clarity on the Definition of Prompt Injection
Got into a debate with someone about whether Johann Rehberger’s attack against Anthropic’s Computer Use functionality was Prompt Injection or not. Here’s the attack and the thread about it.
This is a SUPER cool demo but I’m not sure I’d classify it as prompt injection.
The issue is that the instruction on the site is to run a program. And Computer Use is designed to follow instructions.
So the demo is showing that computers will follow dangerous instructions.
— ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ (@DanielMiessler)
10:14 AM • Oct 25, 2024
If you go through the whole thread it all comes down to definitions—as usual. My point was that if you tell an AI agent to eat poison—and it eats it and gets hurt—that’s NOT prompt injection. It’s a direct instruction followed by an agent.
So my take was that if you tell an agent to go to a website and download an executable and execute it—that’s the same. It’s like telling your computer to rm -rf
. It’ll do it. And that’s not injection, it’s just a dangerous command.
But what’s super important here is WHO is asking for a given thing to happen, and what they EXPECTED would happen. You have to look at the implied goal of the REQUESTOR, and compare THAT to what ACTUALLY happens.
So if the requestor said:
Go execute commands on this possibly dangerous website.
That would not be prompt injection because it was just following commands.
What I missed in this particular case was that the initial command sent to the tool wasn’t to go and do what was on the website, but to just load the site. So the implied expectation of the REQUESTOR was normal browsing—not downloads and executions. So, given my definition above, and this initial setup—I’d call myself wrong about my original take.
Here’s the definition I have in my Real World AI Defintiions now, updated to magnify the importance of this wrinkle. And great research by Johann Rehberger!
Prompt Injection is an attack technique that uses specially crafted input to trick an AI into doing something that violates intent/expectation and leads to a negative outcome.
Sponsor
Scale SaaS security and reduce spend with Nudge
Learn how cloud-first org Stravito scaled their SaaS security program while cutting spend and supporting rapid company growth, achieving these results:
Immediate visibility of their entire SaaS footprint
Streamlined user access reviews
Complete (and automated) employee offboarding
Read the case study
VMware has released updates for vCenter Server to fix a critical remote code execution vulnerability, CVE-2024-38812, with a CVSS score of 9.8. MORE
The Biden administration released the first National Security Memorandum on AI. I love its focus on not losing to China, and making sure it’s safe, secure, and trustworthy. It also focused a lot on being aligned with democratic (small d) values. MORE | THE MEMORANDUM
Fortinet has disclosed a critical vulnerability, CVE-2024-47575, in FortiManager, actively exploited in the wild. Known as FortiJump, this flaw allows remote code execution via the FGFM protocol and affects FortiManager and FortiAnalyzer models. MORE
Salt Typhoon (China affiliated) is suspected of breaching major telecom companies, targeting American political figures like Kamala Harris, Charles Schumer, Donald Trump, and J.D. Vance. MORE
TSMC has stopped doing business with a client after finding out that chips were being sent to Huawei, which is under US sanctions. The whole game for China now is to find proxies to buy through, or to use services like AWS that can hook up NVIDIA chips. MORE
Russia amplified false claims about U.S. hurricane responses to manipulate political discourse before the presidential election, according to the Institute for Strategic Dialogue. MORE
Both US parties are worried about last-minute deepfakes that create chaos and/or move the election. MORE
Speaking of that 👆🏼, the FBI says Russian actors created a fake video showing mail-in ballots for Trump being destroyed in Pennsylvania. MORE
AI / TECH
Google is working on "Project Jarvis," an AI agent for Chrome that automates web tasks like research and booking flights. Powered by Gemini 2.0, Jarvis takes screenshots to interpret and act on tasks. MORE
💡This will be Google’s first move into the all-seeing digital assistant space, and I like to see it only because it will increase pressure on everyone to release theirs.
But I think this implementation is short-sighted due to it being browser-based. They really need “Jarvis” to live deeply in the OS, which is where Apple be heading soon.
World models, or world simulators, are emerging as a significant path for developing AI, and I’m really excited about the direction. MORE
💡I personally feel (as a non-expert in the weeds) that there will be a certain point of world model development (combined with post-training) that will unlock both AGI and ASI—although it might not be needed for AGI.
In other words, if an AI understands enough of how the world works, and it understands how to do science (conjecture, experiment design, and testing), that might be all it needs.
Plus, even if it’s not, it’s also the path to self-improvement.
TSMC's Phoenix chip plant is outperforming its Taiwan facilities in producing usable chips, according to a company executive on a webinar. Let’s go in-country production! MORE
Tesla's Cybertruck is outselling nearly every other electric vehicle in the US. That was quick. Like two months ago they were a laughing stock. MORE
Waymo just raised $5.6 billion in a Series C to expand to new cities. MORE
Determinate Systems is trying to make Nix is the go-to for software development by enabling flakes, streamlining private repositories, and improving dependency management. MORE
💡Dammit. These people are going to make me learn Nix aren’t they?
It’s hit my radar enough in the last year that I’m going to take a few days and learn the religion.
NASDAQ CEO Adena Friedman isn't shocked that startup IPOs haven't bounced back in 2024. She says while the S&P 500 is up 22%, it's mainly due to large-cap companies like Apple and Microsoft, while small-cap companies are struggling. MORE
HUMANS
Researchers have traced 70% of meteorites to three major collisions in the asteroid belt over the last 40 million years. MORE
The US economy is leading the G7 with a projected 2.8% GDP growth. US workers are more productive, generating $171,000 in goods and services annually, compared to $120,000 in Europe and $96,000 in Japan. MORE
Elon Musk has reportedly been in regular contact with Russian President Vladimir Putin since late 2022, which is highly disturbing to me. Probably unrelated, but Elon has seemed a lot less supportive of Ukraine lately. 👎🏼MORE
Russian lawmakers have ratified a pact with North Korea for mutual military assistance and 3,000 North Korean troops have been deployed to Russia. And South Korea is thinking about sending help to Ukraine as a result. MORE | MORE
Character amnesia is becoming a widespread issue in China, where even well-educated individuals are forgetting how to write common Chinese characters. MORE
A study in Alzheimer's & Dementia suggests semaglutide, found in Ozempic and Wegovy, may lower Alzheimer's risk in Type 2 diabetes patients. The research compared semaglutide to seven other diabetes drugs and found a 70% lower Alzheimer's risk compared to insulin. MORE
Walking in short bursts can burn 20-60% more energy compared to continuous walking over the same distance. MORE
DISCOVERY
My friend Matt Johansen highlights the psychological toll of working in security (especially in SOCs), including decision fatigue, anxiety, and sleep disruptions. MORE
Google just launched a new 10-hour course called Prompting Essentials to help people write better AI prompts. MORE
An Ode To Vim MORE
PabloNet
— A wall-mounted diffusion mirror turns webcam reflections into AI-generated paintings using StreamDiffusion
. The setup includes a Raspberry Pi 5, a 10.1" Pi screen, infrared light, and a Pi camera, all housed in a generic frame. MORE
Japan has introduced a digital nomad visa, and Christian Mack shared his experience of getting one. MORE
IRIS
— A new approach called IRIS combines large language models (LLMs) with static analysis to detect security vulnerabilities in software. Using a dataset called CWE-Bench-Java, IRIS detected 69 out of 120 vulnerabilities in Java projects, outperforming traditional static analysis tools that found only 27. MORE
School is Not Enough: Learning is a consequence of doing MORE
llm-whisper-api
— Simon Willison created a quick plugin for LLM to experiment with the OpenAI Whisper API. You can install it using llm install llm-whisper-api
and run it with llm whisper-api myfile.mp3
. MORE
simpletext
— A text-only blog engine using Cloudflare Workers and KV store. It's designed to be lightweight and efficient, leveraging Cloudflare's infrastructure for hosting and data storage. MORE
The Most Important Sentence MORE
One of the weirdest features of the web I know of—text fragments let you link directly to specific text on a webpage without needing an anchor, using a special URL syntax. It even highlights the text when you land on the link. MORE
RECOMMENDATION OF THE WEEK
The counterforce to election stress is reading some older good books. Here’s a great list to choose from.
1. Gödel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter
2. Zen and the Art of Motorcycle Maintenance by Robert M. Pirsig
3. The Book: On the Taboo Against Knowing Who You Are by Alan Watts
4. The Structure of Scientific Revolutions by Thomas S. Kuhn
5. Finite and Infinite Games by James P. Carse
6. Seeing Like a State by James C. Scott
7. The Spell of the Sensuous by David Abram
8. Ishmael by Daniel Quinn
9. Mind and Nature: A Necessary Unity by Gregory Bateson
10. Small Is Beautiful: Economics as if People Mattered by E.F. Schumacher
APHORISM OF THE WEEK
What you don’t change, you choose.
Thank you for reading. Please forward to a friend and/or share on socials to help support the work.
🫶🏼
Daniel