Dataset for inducing hallucinations or dataset with hallucinations in it

Hi all

I’m looking to test a hallucination detection library so I’m looking for one or both of these things:

  • A dataset with inputs that are likely to produce hallucinations


  • A dataset that has inputs and outputs (that may or may not be hallucinations) already in it. Ideally this dataset would have multiple outputs for each input and it would specify what model (open source or proprietary) created the output. More ideally it would have this data for chatgpt as one of the models

Anything like this exist? Thanks!