Random utf-8 errors from dataset

I’ve moved on from that project, at this point, so unfortunately I can’t give a stack trace. But I will say that I think it was more a data problem than a datasets problem. I still see the error with some of the data I’m using now, but I’ve started including chardet in my data pipeline, which seems to fix it (though it’s a bit pokey).