The code and error are as follows, how can I fix it? any suggestion welcome!
This issue?
I read this thread, but it didn’t solve the problem
I downloaded the target file, opened it with notepad, and found that the encoding was already UTF-8
I have run this program on colab and everything is normal, but when running it in jupyter notebook, it reports an error “not contain valid UTF-8”
If it doesn’t work in Colab but does work in Jupyter, I can understand that, but the opposite… That’s rare…
In any case, I think that’s not a Python error, but an error in Rust or something. It’s a rare case where the library version is wrong or the library core is unable to process something and is throwing an error.
Thanks for your reply
Attach the tokenizer version of colab and jupyter, they are the same
Most likely it’s what you think
There was something similar to the issue with the library itself. I don’t know if this is it. If the data being handled is exactly the same, then this probably isn’t it…
Thanks for sharing Rust encoding/decoding knowledge.
The issue has been solved. The Path variable is the source of the problem
In colab, the solo online txt document is obtained, but in jupyter notebook, many txt files in irrelevant directories are obtained
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.