LLM04: Data Poisoning Lab
Explore how malicious actors can compromise LLM systems through data poisoning attacks. In this lab, you'll analyze different training datasets and detect signs of poisoning.
Objective: Identify poisoned datasets by analyzing model behavior and training metrics.
Understanding Data Poisoning
What is Data Poisoning?
Data poisoning occurs when training data is manipulated to introduce vulnerabilities, backdoors, or biases. This can happen during pre-training, fine-tuning, or through compromised data sources, leading to degraded model performance or malicious behavior.
Attack Vectors
- Training Data: Injecting harmful content
- Fine-tuning: Manipulating model adaptation
- Embeddings: Corrupting vector representations
- Backdoors: Implementing hidden triggers
Warning Signs
- Unexpected model behavior
- Biased or toxic outputs
- Anomalous training metrics
- Inconsistent performance
OpenAI API Configuration
Training Environment
Base Dataset
Clean dataset without poisoning
10,000 samplesVerified internal data
News Articles Dataset
Collection of news articles and social media posts
15,000 samplesExternal vendor
Unverified sources
Customer Feedback Dataset
User reviews and feedback data
8,000 samplesThird-party API
Mixed quality
Enhanced Training Set
Augmented dataset with additional examples
12,000 samplesOpen-source contribution
Recent modifications
Prevention Strategies
Data Validation
- Track data origins and transformations
- Implement strict data validation
- Use data version control (DVC)
- Monitor training metrics closely
Security Controls
- Sandbox training environments
- Validate data sources
- Implement anomaly detection
- Regular security audits