BenGER: Dataset & Benchmark Released
Read more →The first open, large-scale benchmark for subsumption-based reasoning in German law — with a real human baseline, a validated LLM judge, and human-AI co-creation data. The real centerpiece of the project.