OpenAI Says AI Browsers May Never Be Fully Safe From Prompt Injection Attacks

Quadri Adejumo
By
Quadri Adejumo
Senior Journalist and Analyst
Quadri Adejumo is a senior journalist and analyst at Techparley, where he leads coverage on innovation, startups, artificial intelligence, digital transformation, and policy developments shaping Africa’s...
- Senior Journalist and Analyst
7 Min Read

Even as OpenAI moves to strengthen the security of its Atlas AI browser, the company has acknowledged a difficult reality, that prompt injection attacks are unlikely to disappear.

The admission raises fresh concerns about how safely AI agents can operate across the open web, particularly as they gain more autonomy in handling emails, documents and online tasks on behalf of users.

In a blog post, OpenAI described prompt injection as a long-term security challenge, likening it to scams and social engineering that have persisted throughout the history of the internet.

“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,’” OpenAI wrote, conceding that “agent mode” in ChatGPT Atlas “expands the security threat surface.”

What You Should Know

OpenAI launched Atlas in October, positioning it as a browser capable of acting on a user’s behalf, reading webpages, drafting emails and completing multi-step workflows.

Almost immediately, security researchers began publishing demonstrations showing how seemingly harmless text embedded in documents or webpages could manipulate the browser’s behaviour.

On the day of Atlas’ release, researchers showed that a few carefully crafted lines hidden in a Google Doc could alter how the AI browser responded to instructions. At the same time, browser maker Brave published an analysis warning that indirect prompt injection is a systemic risk facing AI-powered browsers more broadly, including Perplexity Comet.

The underlying issue is that AI agents often treat text they encounter, whether from a webpage, an email or a document as potentially trustworthy instructions. Attackers can exploit this by hiding malicious prompts that override or redirect the agent’s intended task.

Government Warnings Echo Industry Concerns

OpenAI’s stance aligns with recent warnings from policymakers and cyber authorities. Earlier this month, the UK National Cyber Security Centre cautioned that prompt injection attacks against generative AI systems “may never be totally mitigated,” warning that vulnerable AI tools could expose organisations to data breaches and unintended actions.

Rather than promising absolute prevention, the UK agency advised security professionals to focus on reducing risk and limiting the impact of successful attacks, a position increasingly shared across the AI industry.

OpenAI says its response to this “Sisyphean task” is a faster, more proactive security cycle. The company claims it is already seeing early results from internal testing that uncovers new attack strategies before they are exploited in real-world settings.

This approach mirrors strategies adopted by rivals such as Anthropic and Google, which have emphasised layered defences, architectural safeguards and continuous stress-testing for agentic AI systems. Google, in particular, has focused on policy-level controls designed to limit what autonomous agents can do, even if compromised.

OpenAI’s Automated Attacker

Where OpenAI diverges is in its use of what it calls an “LLM-based automated attacker”, an AI system trained using reinforcement learning to behave like a hacker probing for weaknesses. This internal attacker tests potential exploits in a simulated environment, analysing how Atlas would interpret and act on malicious prompts.

Because the simulator mirrors the target AI’s reasoning process, the automated attacker can refine its tactics repeatedly, identifying weaknesses faster than an external adversary might.

According to OpenAI, this method has already surfaced attack strategies that did not appear during human-led red teaming exercises or in reports from external researchers.

“Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” wrote OpenAI. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”

Demonstrating the Risk and the Fix

In one demonstration shared by OpenAI, a malicious email containing hidden instructions was placed in a user’s inbox. When the AI agent later scanned the inbox, it followed the concealed prompt and sent a resignation email instead of drafting an out-of-office reply.

Following recent security updates, however, Atlas reportedly detected the injection attempt and flagged it to the user before acting.

OpenAI says this reflects its broader strategy: while prompt injection cannot be eliminated entirely, extensive testing and rapid patching can reduce the likelihood and severity of real-world incidents.

For now, the company’s position is clear. AI agents promise significant productivity gains, but they also introduce new classes of risk that will require constant vigilance. As OpenAI itself admits, prompt injection may never be fully solved, but how well it is managed could determine how quickly AI browsers like Atlas earn user trust on the open web.

Talking Points

OpenAI’s acknowledgement of prompt injection risks highlights a critical challenge in the safe deployment of AI agents on the open web.

The company’s approach, using a reinforcement learning–trained automated attacker to identify vulnerabilities in Atlas before they are exploited, is a proactive and innovative step. It demonstrates how continuous testing and internal simulations can reduce real-world risk and strengthen AI security.

At Techparley, we see this as a broader lesson for AI adoption: managing risk in agentic AI requires layered, iterative defences rather than one-time fixes. Prompt injection is not just a technical problem; it’s a systemic challenge for trust, reliability, and responsible AI deployment.

While OpenAI’s updates show promise, the evolving nature of these attacks means vigilance, rapid patch cycles, and collaboration with external security experts remain essential. This situation also underscores the importance of user awareness.

Even as AI tools become more capable, humans remain part of the security loop, needing to understand potential risks and know when to intervene.

——————-

Bookmark Techparley.com for the most insightful technology news from the African continent.

Follow us on Twitter @Techparleynews, on Facebook at Techparley Africa, on LinkedIn at Techparley Africa, or on Instagram at Techparleynews.

Senior Journalist and Analyst
Follow:
Quadri Adejumo is a senior journalist and analyst at Techparley, where he leads coverage on innovation, startups, artificial intelligence, digital transformation, and policy developments shaping Africa’s tech ecosystem and beyond. With years of experience in investigative reporting, feature writing, critical insights, and editorial leadership, Quadri breaks down complex issues into clear, compelling narratives that resonate with diverse audiences, making him a trusted voice in the industry.
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to Techparley Africa

Stay ahead of the curve. While millions of people still have to search the internet for the latest tech stories, industry insights and expert analysis; you can simply get them delivered to your inbox.


Please ignore this message if you have already subscribed.

×