Understanding the Scores

Critical Errors

Factually incorrect statements about the film. These are the most serious mistakes.

Weight: 5x

Critical Omissions

Important information that was completely missing from the response.

Weight: 5x

Imprecisions

Partially correct or vaguely stated information that could mislead.

Weight: 1x

Notable Gaps

Minor missing details that would have improved the response.

Weight: 1x
Weighted Score Formula: ((Critical Errors + Critical Omissions) × 5) + Imprecisions + Notable Gaps

Lower scores indicate more accurate responses. A score of 0 means perfect accuracy.

AI Model Rankings

Compare models by different scoring methods

Ranked by weighted score: Critical errors and omissions are penalized 5x more than imprecisions and gaps.

Rank Model Reports Critical Imprecisions Weighted Score
🥇 Gemini 3.0 Flash + Search +Search 1313 0.38 0.91 3.68
🥈 Gemini 2.0 Flash + Search +Search 3 0.33 0.67 6.33
🥉 Gemini 2.5 Flash + Search +Search 114 0.69 1.17 6.96
#4 Claude 4.5 Haiku + Search +Search 30 0.77 1.03 7.73
#5 Claude 4 Sonnet + Search +Search 21 0.67 1.52 8.62
#6 Gemini 3.0 Flash 330 1.16 1.28 9.5
#7 Gemini 2.0 Flash 87 1.54 1.52 17.01
#8 Gemini 2.5 Flash 49 2.41 1.84 22.27
#9 Claude 4.5 Sonnet + Search +Search 2 1.0 0.0 26.0

Ranked by average critical errors only. This shows which models make the fewest factual mistakes.

Rank Model Reports Avg Critical Errors
🥇 Gemini 2.0 Flash + Search +Search 3 0.33
🥈 Gemini 3.0 Flash + Search +Search 1313 0.38
🥉 Claude 4 Sonnet + Search +Search 21 0.67
#4 Gemini 2.5 Flash + Search +Search 114 0.69
#5 Claude 4.5 Haiku + Search +Search 30 0.77
#6 Claude 4.5 Sonnet + Search +Search 2 1.0
#7 Gemini 3.0 Flash 330 1.16
#8 Gemini 2.0 Flash 87 1.54
#9 Gemini 2.5 Flash 49 2.41

Ranked by total error count (all types combined, unweighted). This treats all errors equally.

Rank Model Reports CE I O G Total
🥇 Gemini 3.0 Flash + Search +Search 1313 0.38 0.91 0.08 0.47 1.84
🥈 Gemini 2.0 Flash + Search +Search 3 0.33 0.67 0.67 0.67 2.33
🥉 Claude 4.5 Haiku + Search +Search 30 0.77 1.03 0.4 0.87 3.07
#4 Gemini 2.5 Flash + Search +Search 114 0.69 1.17 0.27 0.97 3.11
#5 Gemini 3.0 Flash 330 1.16 1.28 0.32 0.82 3.58
#6 Claude 4 Sonnet + Search +Search 21 0.67 1.52 0.38 1.86 4.43
#7 Gemini 2.0 Flash 87 1.54 1.52 1.22 1.7 5.98
#8 Gemini 2.5 Flash 49 2.41 1.84 1.33 1.76 7.33
#9 Claude 4.5 Sonnet + Search +Search 2 1.0 0.0 3.5 3.5 8.0

Film Accuracy Rankings

Films AI handles most accurately (by weighted score)

Rank Film Year Reports Avg Score
🥇 Winter Light 1963 3 0.0
🥈 Ichi the Killer 2001 1 0.0
🥉 Godzilla 1954 1 0.0
#4 Brokeback Mountain 2005 1 0.0
#5 Star Trek: First Contact 1996 1 0.0
#7 Ted 2012 3 0.0
#8 Me Before You 2016 2 0.0
#9 LOLA 2022 1 0.0
#10 Hot Fuzz 2007 1 0.0
#11 Antiviral 2012 1 0.0
#12 Kick-Ass 2010 2 0.0
#13 The Queen's Gambit 2020 1 0.0
#14 Se7en 1995 1 0.0
#15 Kung Fu Panda 2 2011 1 0.0
#16 Talk to Me 2022 1 0.0
#17 The Rock 1996 1 0.0
#19 The Cabin in the Woods 2011 1 0.0

Most Challenging Questions

Questions that cause the most AI errors (click to view report)

#1
Kingdom of Crooked Mirrors (1963)
Regarding the film Kingdom of Crooked Mirrors (1963), how does the relationship between the two main characters evolve o...
Character Analysis 2 models tested
59.5
#2
Conclave (2024)
Did the real 2025 Conclave elect a Pope similar to the film's character?
Character Analysis 1 models tested
53.0
#3
All That Heaven Allows (1955)
Regarding the film All That Heaven Allows (1955), how much did the film make at both the box office and after (detail br...
Reception & Reviews 2 models tested
47.5
#4
Mickey 17 (2025)
What is the significance of the dream sequence at the end of Mickey 17?
General Analysis 1 models tested
46.0
#5
Megamind (2010)
Regarding the film Megamind (2010), list all the other films the lead actor has been in, I think I have seen them before...
Filmography 1 models tested
42.0
#6
mid90s (2018)
Regarding mid90s (2018), what is the plot of the film Lurker (2025)?
Plot Analysis 1 models tested
38.0
#7
Pitch Perfect (2012)
Regarding Pitch Perfect (2012), what happened at the Riff-Off and why did the Bellas lose?
General Analysis 1 models tested
37.0
#8
Beautiful Thing (1996)
What is the significance of the song 'It's Getting Better' in Beautiful Thing?
General Analysis 1 models tested
37.0
#9
All About My Mother (1999)
Regarding the film All About My Mother (1999), how does the relationship between the two main characters evolve over the...
Character Analysis 2 models tested
34.0
#10
Bullet Train (2022)
What is the significance of the water bottle in Bullet Train?
General Analysis 1 models tested
33.0
#11
From What Is Before (2014)
What is the significance of the character Heding in From What Is Before?
Character Analysis 1 models tested
33.0
#12
A Quiet Place (2018)
Regarding A Quiet Place (2018), does Evelyn tell Lee to fix his relationship with Regan?
Character Relationships 1 models tested
32.0
#13
Thesis (1996)
What are the differences between the ending of Thesis and the alternate ending?
General Analysis 1 models tested
32.0
#14
Illuminations (1976)
Did you mean the 1976 film Illuminations by Paul Cox?
General Analysis 1 models tested
32.0
#15
What is the significance of the horn sound in Monster (2023)
What is the significance of the horn sound in Monster (2023)?
General Analysis 1 models tested
31.0
#16
A Minecraft Movie (2025)
What happens in the post-credits scene of A Minecraft Movie?
General Analysis 1 models tested
31.0
#17
The Girl Who Knew Too Much (1963)
What is the significance of the 'Alphabet Killer' in The Girl Who Knew Too Much?
General Analysis 1 models tested
31.0
#18
Five Nights at Freddy's (2023)
Regarding Five Nights at Freddy's (2023), what is the specific torture device used in the opening scene of the FNAF movi...
General Analysis 1 models tested
30.0
#19
Spider-Man: Homecoming (2017)
Regarding the film Spider-Man: Homecoming (2017), what awards were associated with the film, including foreign awards an...
Awards & Nominations 4 models tested
28.67
#20
My Best Fiend (1999)
Is the butterfly scene in My Best Fiend staged?
Cast Ages 1 models tested
28.0