正在改变与想要改变世界的人,都在虎嗅APP
В Минтрансе раскрыли детали перевозки пассажиров с Ближнего Востока14:40。新收录的资料对此有专业解读
Жители Кубы вышли на ночные протесты с кастрюлями01:06。新收录的资料对此有专业解读
My first instinct was creativity. I had models generate poems, short stories, metaphors, the kind of rich, open-ended output that feels like it should reveal deep differences in cognitive ability. I used an LLM-as-judge to score the outputs, but the results were pretty bad. I managed to fix LLM-as-Judge with some engineering, and the scoring system turned out to be useful later for other things, so here it is:
Путин провел телефонный разговор с Трампом. О чем говорили президенты?23:48, 9 марта 2026