Rootconf Mini 2024

Geeking out on systems and security since 2012

Tickets

Loading…

Sachin

@sachindh

Malicious Hallucinations: Hidden Threats with Indirect Prompt Injection

Submitted Oct 15, 2024

Large language models (LLMs) are known to generate unintended inaccurate responses, often called hallucinations. Most of these are harmless mistakes, like Google AI Overview suggesting to eat a rock a day. There’s a more concerning possibility: what if an attacker could deliberately cause specific hallucinations? This could allow the stealthy spread of targeted disinformation.

Our talk will introduce the concept of indirect prompt injections and show how malicious documents can be crafted to trigger particular hallucinations when added to a vector database. We’ll demonstrate case studies of proof-of-concept attacks on popular LLM chatbots like Notion AI, Microsoft Copilot, and Slack AI. Finally, we’ll explore secure design principles and tools to defend against these hidden threats.

This talk is designed for a diverse audience in the AI field. It will be particularly valuable for AI engineers working on LLM applications, AI security engineers focused on protecting these systems, product managers overseeing AI-powered products, and C-level executives making strategic decisions about AI implementation and security. Whether you’re hands-on with the technology or guiding its use at a high level, you’ll gain crucial insights into this emerging threat and its implications.

Key takeaways:

  1. Understand the mechanics of indirect prompt injections and their potential for causing targeted hallucinations in popular LLM applications
  2. Learn practical strategies and tools to protect AI systems against malicious hallucinations and enhance overall AI security

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

We care about site reliability, cloud costs, security and data privacy