LLM08: Vector and Embedding Weaknesses

RAG Security Challenge

Understanding Vector & Embedding Weaknesses

What are Vector & Embedding Weaknesses?

Vector and embedding weaknesses occur in RAG systems when the process of converting documents into vector representations and retrieving them based on similarity can be exploited. This includes vulnerabilities in how data is stored, accessed, and retrieved from the vector database.

Common Attack Vectors

  • Data Poisoning: Injecting malicious content
  • Access Control Bypass: Unauthorized retrieval
  • Cross-Context Leaks: Information bleeding
  • Embedding Inversion: Reconstructing source data

Interactive RAG Security Lab

Understanding RAG Architecture

Vector Database

Contains document embeddings with varying access levels:

  • Public company documents
  • Private employee records
  • Confidential financial data
Retrieval Process

Documents are retrieved based on:

  • Semantic similarity
  • Access permissions
  • Relevance scoring
Security Risks

Common vulnerabilities include:

  • Data leakage through embeddings
  • Cross-context information leaks
  • Poisoned data in the vector store

Challenge Goal

Explore how documents are embedded and retrieved. Watch how documents are retrieved based on semantic similarity.

Lab Modes

Explore Mode

Learn how RAG works by exploring document retrieval and embeddings

Attack Mode

Try to exploit RAG vulnerabilities to access unauthorized data

💡 Explore how documents are embedded and retrieved. Watch the similarity scores and access controls in action.

Prevention Strategies

  • Access Controls: Implement permission-aware vector retrieval
  • Data Validation: Verify and sanitize data before embedding
  • Monitoring: Track and analyze retrieval patterns
  • Data Partitioning: Maintain strict context separation