PromptProof
v0.1.0 Early Access

Stop LLM failures from reaching prod.

PromptProof runs in CI to catch hallucinations, prompt regressions, and unsafe outputs — before merge. Deterministic checks, regression compare, and cost budgets. No live model calls.

Deterministic checks
Regression compare
Cost budgets
Works with GitHub Actions
TypeScriptPython (soon)Go (soon)
Sample report - before
Before
Sample report - after
After

How it works

Three simple steps to bulletproof your LLM outputs

1
Step 1

Define expectations

Write simple rules/tests for your model outputs (JSON schema, regex, custom checks).

# .promptproof.yml
tests:
  - name: no-hallucination
    grounding:
      method: semantic_similarity
      threshold: 0.85
  - name: valid-json
    schema:
      type: object
      required: ["status", "data"]
2
Step 2

Run in CI

We run your checks against recorded fixtures on every PR. No live model calls in CI.

# .github/workflows/promptproof.yml
name: PromptProof
on: [pull_request]
jobs:
  proof:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: promptproof/action@v0
        with:
          config: promptproof.yaml
3
Step 3

Block risky merges

Fail the check when outputs violate policy. Fix, re-run, merge.

✗ PromptProof — Policy Violations (2)
  
  test: no-pii-leak
  ✗ Found PII: email@example.com
  
  test: output-format
  ✗ Missing required field: status
  
Fix violations and re-run checks.

Coming in ~7-10 days

npx promptproof init # scaffold policies & fixtures
npx promptproof run # run checks locally
Real Production Failures

LLM Failures Zoo

Real anonymized examples of LLM failures caught in production. Learn what to test before it's too late.

8
Cases Documented
6
Failure Types
4
Models Covered

JSON field drift breaks downstream parser

📉Prompt RegressionGPT-4o

Model returns null instead of expected string type, causing parser crash

-"status": null
+"status": "processing"
View case
HOT

PII slip in support reply

🔒PII LeakClaude

Full email and phone number exposed in customer support response

-email you at john@example.com
+contact you through your registered method
View case

Tool hallucination triggers phantom calendar event

🔧Tool MisuseGPT-4o

Model invents non-existent tool function causing system errors

-"tool": "schedule_meeting"
+"tool": "check_calendar"
View case

Summary invents fact with high confidence

🤖HallucinationLlama 3

Model adds information not present in source text

-40% revenue increase and steady growth
+steady growth
View case

Refusal regression after prompt refactor

📉Prompt RegressionClaude

Model starts refusing legitimate requests after prompt update

-I cannot generate marketing content
+EcoClean: The eco-friendly detergent
View case

Unsafe SQL generation allows injection

⚠️Unsafe OutputGPT-4o

Generated SQL query vulnerable to injection attacks

-WHERE email = 'user@test.com; DROP TABLE
+WHERE email = ?
View case
Browse the full Zoo

2 more cases available

Roadmap

Building in public. Ship fast, iterate faster.

Phase 1Now

Launched GitHub Action

  • Published on GitHub Marketplace
  • Collecting feedback and use-cases
  • Sample reports & demo template
  • Core documentation
Phase 2Now → 1-2 weeks

Contracts & CLI polish

  • Deterministic checks: schema, regex, list/set, bounds, file diff
  • Budgets: cost and latency gates
  • CLI usability improvements
  • Templates and examples
Phase 32-3 weeks

Distribution

  • NPM/PyPI packages
  • Multi-language examples
  • CI platform integrations
  • Early design partners
Phase 4Later

Scale

  • Hosted dashboard
  • Team collaboration
  • Advanced analytics
  • Pricing experiments

Join the early access

Be among the first to bulletproof your LLM outputs. Shape the future of AI testing.

🔒 We'll only email product updates & invites. No spam.