I have a pet theory that hallucinations can be mitigated by agents in a sort of "peer-review" session. To test this I had ChatGPT give me a breakdown of the Flames 2004 cup run, then I handed that off to a separate LLM and told them to sweep for inaccuracies.
It looks like the original output got the following wrong:
- It said Kipper set a record for GAA (it was a "modern day" record)
- It said we won 3 in OT vs Detroit (we won 2)
The second LLM did actually catch both errors and corrected them... so in a weird way this did sorta work. Of course it couldn't be less scientific and I can't be 100% certain it didn't miss things. However, it did patch 2 gaping holes.
|