• wizardbeard@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    22
    ·
    6 days ago

    It was widely publicized to get this wrong in a previous version, so they did what must have been a manual fix on top when they released the next one because it would smarmily say something along the lines of “haha, you almost got me” but was still easy to demonstrate it was some bodge job by just changing the words slightly so it wouldn’t trip the hard coded handling for this “riddle”.

    I guess they figured no one was still paying attention and forgot to carry over the bodge job, lol.

    • brucethemoose@lemmy.world
      cake
      link
      fedilink
      arrow-up
      10
      ·
      6 days ago

      This has been happening forever. The local LLM folks poke them with riddles all the time, but then they get obviously trained in.

      What’s more, standard tests like MMLU are all jokes now. All the major LLMs game the benchmarks and are contaminated up and down; Meta even got caught using a specific finetune to game LM Arena. The only tests worth a damn are those in niche little corners of the internet no one knows about, or niche private ones.