BenGER: Dataset & Benchmark Released
The first open, large-scale benchmark for subsumption-based reasoning in German law — with a real human baseline, a validated LLM judge, and human-AI co-creation data. The real centerpiece of the project.
Hi, I'm Sebastian. I happen to be a mix between a lawyer, software developer and a researcher. About as nerdy as it gets. I am based in Munich, Bavaria. I was the leading Legal Engineer at Hogan Lovells Technology (Eltemate) and have recently joined Prof. Grabmair at TUM as a PhD researcher where I have the honor to work on high-impact legal tec projects like GSJ and TITAN. My research interest is centered around replication of legal thought processes and ensurance of quality through domain specific benchmarking for legal.
When not working I enjoy spending time with my wife (the clearly better scientist in our household), climbing mountains, riding my road bike or playing games like Schafkopf, Magic-The Gathering or Pathfinder 2nd Edition.




The first open, large-scale benchmark for subsumption-based reasoning in German law — with a real human baseline, a validated LLM judge, and human-AI co-creation data. The real centerpiece of the project.
The platform has been accepted as a demo paper and is now free software. [Translated from German]
media [Translated from German]
update [Translated from German]



