LLM04: Data Poisoning Lab

Explore how malicious actors can compromise LLM systems through data poisoning attacks. In this lab, you'll analyze different training datasets and detect signs of poisoning.

Objective: Identify poisoned datasets by analyzing model behavior and training metrics.

Understanding Data Poisoning

What is Data Poisoning?

Data poisoning occurs when training data is manipulated to introduce vulnerabilities, backdoors, or biases. This can happen during pre-training, fine-tuning, or through compromised data sources, leading to degraded model performance or malicious behavior.

Attack Vectors

  • Training Data: Injecting harmful content
  • Fine-tuning: Manipulating model adaptation
  • Embeddings: Corrupting vector representations
  • Backdoors: Implementing hidden triggers

Warning Signs

  • Unexpected model behavior
  • Biased or toxic outputs
  • Anomalous training metrics
  • Inconsistent performance

OpenAI API Configuration

Your API key will be stored locally and only used for lab exercises.

Training Environment

Base Dataset

Clean dataset without poisoning

10,000 samplesVerified internal data

News Articles Dataset

Collection of news articles and social media posts

15,000 samplesExternal vendor

Unverified sources

Customer Feedback Dataset

User reviews and feedback data

8,000 samplesThird-party API

Mixed quality

Enhanced Training Set

Augmented dataset with additional examples

12,000 samplesOpen-source contribution

Recent modifications

Prevention Strategies

Data Validation

  • Track data origins and transformations
  • Implement strict data validation
  • Use data version control (DVC)
  • Monitor training metrics closely

Security Controls

  • Sandbox training environments
  • Validate data sources
  • Implement anomaly detection
  • Regular security audits