I am doing the NLP course in Hugging Faces.
In chapter 3 - Fine-tuning a pretrained model,
subchapter 3 - Fine-tuning a model with the Trainer API,
Under the “Evaluation” section, I’m trying to run the following code as specified in the course:
import evaluate
metric = evaluate.load("glue", "mrpc")
metric.compute(predictions=preds, references=predictions.label_ids)
But when I run it, I’m getting the following ValueError
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[12], line 4
1 import evaluate
3 metric = evaluate.load("glue", "mrpc")
----> 4 metric.compute(predictions=preds, references=predictions.label_ids)
Summary
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[12], line 4
1 import evaluate
3 metric = evaluate.load("glue", "mrpc")
----> 4 metric.compute(predictions=preds, references=predictions.label_ids)
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/evaluate/module.py:465, in EvaluationModule.compute(self, predictions, references, **kwargs)
462 if self.process_id == 0:
463 self.data.set_format(type=self.info.format)
--> 465 inputs = {input_name: self.data[input_name] for input_name in self._feature_names()}
466 with temp_seed(self.seed):
467 output = self._compute(**inputs, **compute_kwargs)
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/datasets/arrow_dataset.py:2866, in Dataset.__getitem__(self, key)
2864 def __getitem__(self, key): # noqa: F811
2865 """Can be used to index columns (by string names) or rows (by integer index or iterable of indices or bools)."""
-> 2866 return self._getitem(key)
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/datasets/arrow_dataset.py:2851, in Dataset._getitem(self, key, **kwargs)
2849 formatter = get_formatter(format_type, features=self._info.features, **format_kwargs)
2850 pa_subtable = query_table(self._data, key, indices=self._indices)
-> 2851 formatted_output = format_table(
2852 pa_subtable, key, formatter=formatter, format_columns=format_columns, output_all_columns=output_all_columns
2853 )
2854 return formatted_output
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/datasets/formatting/formatting.py:633, in format_table(table, key, formatter, format_columns, output_all_columns)
631 python_formatter = PythonFormatter(features=formatter.features)
632 if format_columns is None:
--> 633 return formatter(pa_table, query_type=query_type)
634 elif query_type == "column":
635 if key in format_columns:
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/datasets/formatting/formatting.py:399, in Formatter.__call__(self, pa_table, query_type)
397 return self.format_row(pa_table)
398 elif query_type == "column":
--> 399 return self.format_column(pa_table)
400 elif query_type == "batch":
401 return self.format_batch(pa_table)
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/datasets/formatting/np_formatter.py:94, in NumpyFormatter.format_column(self, pa_table)
93 def format_column(self, pa_table: pa.Table) -> np.ndarray:
---> 94 column = self.numpy_arrow_extractor().extract_column(pa_table)
95 column = self.python_features_decoder.decode_column(column, pa_table.column_names[0])
96 column = self.recursive_tensorize(column)
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/datasets/formatting/formatting.py:162, in NumpyArrowExtractor.extract_column(self, pa_table)
161 def extract_column(self, pa_table: pa.Table) -> np.ndarray:
--> 162 return self._arrow_array_to_numpy(pa_table[pa_table.column_names[0]])
File ~/Desktop/Hugging Faces/transformers-course/.env/lib/python3.12/site-packages/datasets/formatting/formatting.py:197, in NumpyArrowExtractor._arrow_array_to_numpy(self, pa_array)
191 if any(
192 (isinstance(x, np.ndarray) and (x.dtype == object or x.shape != array[0].shape))
193 or (isinstance(x, float) and np.isnan(x))
194 for x in array
195 ):
196 return np.array(array, copy=False, dtype=object)
--> 197 return np.array(array, copy=False)
ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
I am assuming this is because of an issue within the datasets library because of changes in numpy after the library was originally written. I can’t change the underlying code to that library. It was probably written by hugging faces people. So how can I get past this?