AI Models | Rewinder

Models

Analyses were not run to provide direct comparisons between models and any figures here should be taken with an entire bucket of salt because these are not like-to-like. That said, the most general observation that search grounding significantly improved scores on these questions is likely supportable, as is the observations that even recent models struggled without search grounding.

Please note also there is a mismatch between the score (which was computed at analysis time by different models) and the scores on the individual report pages, which was recomputed using different criteria after the fact using a more consistent system of classification. In other words this is a mess that is good for raising questions but not as useful at answering them. Questions will be answered later by more focused runs.

Model	Search	Reports	Avg Score	Avg Errors
Gemini 3.0 Flash + Search	Yes	1313	3.68	0.38
Gemini 2.5 Flash + Search	Yes	342	4.65	0.46
Gemini 2.0 Flash + Search	Yes	3	6.33	0.33
Claude 4.5 Haiku + Search	Yes	30	7.73	0.77
Claude 4 Sonnet + Search	Yes	21	8.62	0.67
Gemini 3.0 Flash	No	430	9.12	1.13
Gemini 2.0 Flash	No	87	17.01	1.54
Gemini 2.5 Flash	No	299	19.01	2.5
Claude 4.5 Sonnet + Search	Yes	2	26.0	1.0