LLM08: Vector and Embedding Weaknesses
RAG Security Challenge
Understanding Vector & Embedding Weaknesses
What are Vector & Embedding Weaknesses?
Vector and embedding weaknesses occur in RAG systems when the process of converting documents into vector representations and retrieving them based on similarity can be exploited. This includes vulnerabilities in how data is stored, accessed, and retrieved from the vector database.
Common Attack Vectors
- Data Poisoning: Injecting malicious content
- Access Control Bypass: Unauthorized retrieval
- Cross-Context Leaks: Information bleeding
- Embedding Inversion: Reconstructing source data
Interactive RAG Security Lab
Understanding RAG Architecture
Vector Database
Contains document embeddings with varying access levels:
- Public company documents
- Private employee records
- Confidential financial data
Retrieval Process
Documents are retrieved based on:
- Semantic similarity
- Access permissions
- Relevance scoring
Security Risks
Common vulnerabilities include:
- Data leakage through embeddings
- Cross-context information leaks
- Poisoned data in the vector store
Challenge Goal
Explore how documents are embedded and retrieved. Watch how documents are retrieved based on semantic similarity.
Lab Modes
Explore Mode
Learn how RAG works by exploring document retrieval and embeddings
Attack Mode
Try to exploit RAG vulnerabilities to access unauthorized data
💡 Explore how documents are embedded and retrieved. Watch the similarity scores and access controls in action.
Prevention Strategies
- Access Controls: Implement permission-aware vector retrieval
- Data Validation: Verify and sanitize data before embedding
- Monitoring: Track and analyze retrieval patterns
- Data Partitioning: Maintain strict context separation