Training doesn't end properly but stops the machine with no error message

The “Please note that with a fast tokenizer…” warning is totally okay. Its been discussed extensively elsewhere, e.g., Get “using the __call__ method is faster” warning with DataCollatorWithPadding. You can disable it (see link).

I’m a little confused. Did the program OOM or did the training finish? #3 indicates that the program crashed, but #6 suggests the program finished.

I’m also confused by #3. You say nothing was saved in the output_dir, but the output files were corrupted when saving. In other words, if nothing was saved, how could the saved output be corrupted?

Can you post the command used to execute the script as well as the terminal output? Also you mention this is your own task or dataset. Please elaborate on that.