le_throosh@lemmy.dbzer0.com to Fuck AI@lemmy.world · 1 month agobrothelemmy.clubimagemessage-square6linkfedilinkarrow-up1427arrow-down18file-text
arrow-up1419arrow-down1imagebrothelemmy.cluble_throosh@lemmy.dbzer0.com to Fuck AI@lemmy.world · 1 month agomessage-square6linkfedilinkfile-text
minus-squarepkjqpg1h@lemmy.ziplinkfedilinkEnglisharrow-up10·1 month agoAccording to the AA-Omniscience benchmark The most expensive models, Opus 4.6 has a 60% hallucination rate and 46% accuracy rate. Gemini 3.1 Pro Preview has a 50% hallucination rate and 55% accuracy rate. And the questions aren’t even open-ended. I don’t even need to tell you about the other models.
minus-squareKairos@lemmy.todaylinkfedilinkarrow-up4·edit-21 month ago“Opus 4.6” like every other LLM has a 100% hallucination rate because that’s the literal only thing they do.
According to the AA-Omniscience benchmark
The most expensive models,
Opus 4.6 has a 60% hallucination rate and 46% accuracy rate. Gemini 3.1 Pro Preview has a 50% hallucination rate and 55% accuracy rate.
And the questions aren’t even open-ended.
I don’t even need to tell you about the other models.
“Opus 4.6” like every other LLM has a 100% hallucination rate because that’s the literal only thing they do.