Prompts Are The New Malware As Enterprise AI Defenses Fall Behind
In its 2026 Global Threat Report, CrowdStrike revealed that prompt injection attacks targeted over 90 organizations in 2025, with these manipulated prompts used to execute commands that resulted in credential theft and cryptocurrency loss. CrowdStrike emphasized a fundamental shift, describing prompts as the new form of malware. The report also highlighted a significant 89% year-over-year increase in AI-enabled adversarial activities, noting that 82% of intrusions occurred without traditional malware. This rise coincides with enterprises transitioning from simple chatbots to more advanced tools like agents, copilots, and browser automations that access sensitive resources such as email, code, payments, and file shares.
Prompt injection has topped the OWASP Top 10 for large language model (LLM) applications twice consecutively under the designation LLM01. OWASP points to a core vulnerability: language models cannot consistently distinguish between developer instructions and content sourced from web pages, emails, or documents. This ambiguity has evolved from an academic concern into a tangible operational risk, with named adversaries, CVE assignments, and lab-confirmed breaches emphasizing the severity.
Prompt injection manifests primarily in two ways. Direct injection occurs when a user inputs commands overriding system prompts, such as instructing a chatbot to disregard previous instructions. More insidiously, indirect injection involves attackers embedding malicious instructions into content later parsed by the model on behalf of an unsuspecting user. Such carriers include emails, collaboration pages, calendar invites, webpages, or uploaded documents, enabling execution without direct user interaction or attacker-model contact.
Two notable disclosures illustrate this threat. In August 2024, PromptArmor identified an attack exploiting Slack AI’s workspace access to exfiltrate private data including API keys, despite attackers lacking channel membership, achieved through planted instructions in public channels or files. The next year, Aim Security revealed EchoLeak (CVE-2025-32711), representing the first documented zero-click prompt injection in a production AI system. A single crafted email could prompt Microsoft 365 Copilot to extract and transmit internal files to an attacker-controlled server without user actions. Although patches addressed these vulnerabilities, the broader attack class persists.
The attack surface continues to expand within the agentic ecosystem. Agents managing mail, cloud infrastructure, or code accept their contextual information as authoritative. Retrieval Augmented Generation (RAG) pipelines ingest compromised documents and web pages. Persistent agent memory stores malicious instructions, surfacing them repeatedly. Enterprises employing multi-model request routing become susceptible to coercion toward weaker model paths.
Vendor defenses have recognized the challenge but admit fundamental limits. OpenAI publicly acknowledged in late 2025 that prompt injection, akin to scams and social engineering, is unlikely to be completely eradicated. It developed reinforcement-learning attackers to proactively discover injection methods internally to feed ongoing adversarial training. Anthropic found its Claude Opus 4.6 agent withstood injection attempts only partially, with success rates climbing dramatically without or even with existing defenses. Google reported that adversarially fine-tuned Gemini models still succumbed to effective attacks over half the time.
As a result, security advisories have become increasingly cautious. Gartner urged CISOs to block AI browsers including ChatGPT Atlas and Perplexity Comet, citing risks like indirect injection and credential leakage compounded by immature defenses. Despite warnings, a significant portion of organizations already allowed some AI browsers. Similar cautionary advice emerged from the UK National Cyber Security Centre and Germany’s BSI.
Current defenses falter because language models conflate instructions and data into a single text channel. Typical security techniques—input validation, output filtering, signature detection, patching—rely on clearly distinguishing trusted commands from untrusted content, a separation not feasible inside LLMs. Vendor-implemented safeguards address common patterns but leave many variants vulnerable. Detection methods struggle with obfuscated, multilingual, or image-encoded injections. Adversarial training yields temporary resilience that attackers repeatedly circumvent. Even minimal failure rates become consequential due to frequent agent execution.
Frameworks like NIST AI 600-1 acknowledge prompt injection as a security risk but focus on policy rather than technical mandates. OWASP introduced controls for agentic applications tackling Goal Hijack and Memory Poisoning, but these guidelines remain advisory.
According to CrowdStrike, 65.3% of organizations lack dedicated prompt injection defenses beyond vendor provisions, policies, and training. Such measures sufficed when AI engagement was limited to chatbots but are inadequate for advanced agents controlling sensitive functions. Effective defense requires external controls: limiting agent privileges strictly to necessary functions, demanding human approval for critical actions like sending mail or executing code, tagging sensitive data retrieval to prevent inclusion in RAG, network allowlisting agent egress domains, and maintaining detailed audit trails that enable replay of all key agent decisions.
For procurement, CISOs should ask vendors four essential questions: how often and with which classifiers they detect prompt injections, published attack success rates for single and multiple attempts, which OWASP-identified vulnerabilities their products effectively mitigate with working controls, and whether security teams can reliably replay the inputs and operations leading to significant agent activities.
Enterprises adopting AI must operate under the assumption that injected instructions will sometimes be executed. True security depends on controls positioned outside the model itself. Trusting the language model as a security boundary risks enabling credential theft cloaked in a user-friendly interface.