HealthBench - UserTrace

clinical evaluation

Benchmarking GPT-4o, Claude Sonnet 4.6, MedGemma 4B, and MedGemma 27B across 500+ simulated patient conversations on healthcare AI.