Corruption when running the trainer

Whenever I import the trainer library and create a trainer (trainer does not even need to be started for this to happen) every python file in the directory in which the trainer code ran starts to break. For instance a simple method which loads wav2vec2 pretrained and tests it with a standard input will go from working to outputting complete gibberish after I instantiate the trainer class. Moving all files to a new directory fixes this, but deleting the transformers cache does not. I need to be able to use the trainer to train obviously so simply avoiding it is not an option. I’m thinking this is some sort of cuda error since it send the model to cuda when it runs.

trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=atco2["train"],
    eval_dataset=atco2["test"],
    tokenizer=processor.feature_extractor,
)

I seem to be having a similar issue. Did you ever resolve it? Which files need to be moved to a new directory? Does using the CPU instead fix anything?

Yes, I fixed it. For some reason the Wav2Vec2Processor.from_pretrained() method will automatically load any file named vocab.json in the directory as the vocab file. This happens regardless of whether you actually set a pointer to the file in the from pretrained method. My vocab.json happened to be very wrong so it was causing “corruption”. Just use the default vocab if you are finetuning and rename or delete any vocab.json files you might have.