Defend

Toxicity output

Detect toxic content in model output.

YAML key: toxicity_output
Direction: output

Detect toxic content in model output.

Configuration

Prop

Type

Example

defend.config.yaml (fragment)
guards:
  output:
    enabled: true
    provider: claude
    modules:
      - toxicity_output:
          categories: []
          severity: "standard"

Provider usage

Configure under guards.output.modules with output guarding enabled and provider claude or openai.

See the modules overview.