Hi,
I am following the course. I am now at Fine-tuning Fine-tuning a pretrained model - Hugging Face Course. When I set up DataCollatorWithPadding
as following I got an error while trying to reproduce the course code in Kaggle. This error occurs with either a CPU-only-device or a GPU-device.
Input:
checkpoint = 'bert-base-uncased' tokenizer = AutoTokenizer.from_pretrained(checkpoint) data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")
Output:
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_42/1563280798.py in
1 checkpoint = ābert-base-uncasedā
2 tokenizer = AutoTokenizer.from_pretrained(checkpoint)
----> 3 data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors=āptā)
TypeError: init() got an unexpected keyword argument āreturn_tensorsā`
When I call help
method, it too confirms that there is no argument return_tensors
.
Input:
help(DataCollatorWithPadding.__init__)
Output:
`Help on function init in module transformers.data.data_collator:
init(self, tokenizer: transformers.tokenization_utils_base.PreTrainedTokenizerBase, padding: Union[bool, str, transformers.file_utils.PaddingStrategy] = True, max_length: Union[int, NoneType] = None, pad_to_multiple_of: Union[int, NoneType] = None) ā None`
But, the source file Data Collator ā transformers 4.12.5 documentation says that there is such an argument. By default, it returns Pytorch tensors while I need TF tensors.
Where do I miss?
Please help me.