Can't run trainer.predict(). ValueError: 'process_id' should be a number greater than 0

I’m using Colab Pro to train the model. I’m trying to evaluate my model on metrics, but I keep getting the following error when I run trainer.predict(test_dataset):

<ipython-input-16-cefe9f0a93b9> in compute_metrics(eval_preds)
      1 def compute_metrics(eval_preds):
----> 2     metric = load_metric("accuracy", "f1", "precision", "recall", "roc_auc")
      3     logits, labels = eval_preds
      4     preds = np.argmax(logits, axis=-1)
      5     return metric.compute(predictions=preds, references=labels)

/usr/local/lib/python3.7/dist-packages/datasets/ in load_metric(path, config_name, process_id, num_process, cache_dir, experiment_id, keep_in_memory, download_config, download_mode, revision, **metric_init_kwargs)
   1445         keep_in_memory=keep_in_memory,
   1446         experiment_id=experiment_id,
-> 1447         **metric_init_kwargs,
   1448     )

/usr/local/lib/python3.7/dist-packages/datasets/ in __init__(self, config_name, keep_in_memory, cache_dir, num_process, process_id, seed, experiment_id, max_concurrent_cache_files, timeout, **kwargs)
    179         # Safety checks on num_process and process_id
    180         if not isinstance(process_id, int) or process_id < 0:
--> 181             raise ValueError("'process_id' should be a number greater than 0")
    182         if not isinstance(num_process, int) or num_process <= process_id:
    183             raise ValueError("'num_process' should be a number greater than process_id")

ValueError: 'process_id' should be a number greater than 0

Here’s what my code currently looks like:

def read_data_split(split_dir):
    split_dir = Path(split_dir)
    texts = []
    labels = []
    for label_dir in ["pos", "neg"]:
        for text_file in (split_dir/label_dir).iterdir():
            labels.append(0 if label_dir is "neg" else 1)

    return texts, labels

train_texts, train_labels = read_data_split('')

#split the dataset

train_texts, test_texts, train_labels, test_labels = train_test_split(train_texts, train_labels, test_size=.2)
test_texts, val_texts, test_labels, val_labels = train_test_split(test_texts, test_labels, test_size=.5)

train_encodings = tokenizer(train_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)

class AlignmentPaperDataset(
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels:
            item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

train_dataset = AlignmentPaperDataset(train_encodings, train_labels)
test_dataset = AlignmentPaperDataset(test_encodings, test_labels)
val_dataset = AlignmentPaperDataset(val_encodings, val_labels)

def compute_metrics(eval_preds):
    metric = load_metric("accuracy", "f1", "precision", "recall", "roc_auc")
    logits, labels = eval_preds
    preds = np.argmax(logits, axis=-1)
    return metric.compute(predictions=preds, references=labels)

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=1,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs

model = RobertaForSequenceClassification.from_pretrained("roberta-base")

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset


predictions = trainer.predict(test_dataset) <--- fails here

# Preprocess raw predictions
preds = np.argmax(predictions.predictions, axis=-1)

The predict method works find when I am not using compute_metrics. I’m not sure how to fix this issue. I’m following the tutorial here: Fine-tuning a model with the Trainer API - Hugging Face Course.

I’m not sure if it is a Colab issue that is causing the metrics error.

I ran into the same problem yesterday trying to make a custom compute_metrics function for distilbert multiclass.

The problem ended up being with load_metric(). This worked:

def compute_metrics(eval_preds):
    metric = load_metric("accuracy", "glue")
    logits, labels = eval_preds
    predictions = argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

I had to ditch mrpc and I think I had to specifically call “accuracy” first as well.

Hmm, so roc_auc (the only metric I really need) is maybe the issue…

Yeah, so I had to remove roc_auc from load_metric for it to work. However, I can still run:

metric = load_metric("roc_auc")
metric.compute(prediction_scores=preds, references=predictions.label_ids)

after I’ve trained the model.