How to custom batch sentences and graphs

AfonsoSousa · January 4, 2023, 4:14pm

Hi. I am using a script much like this example one. I have some attributes which are strings and should be tokenized and padded, but I also have a graph per sample which I am converting into a Data object from PyG which should also be batched.
I would like to know if there is a way I could override the batching process in Seq2SeqTrainer so I could use the PyTorch Geometric (PyG) Data Loader.
If not, I could send the node features and adjacency matrix as tensors, but the adjacency matrix is of varied size. How could I batch it in the preprocess_function?

Topic		Replies	Views
How to use Seq2SeqTrainer (Seq2SeqDataCollator) in v4.2.1 🤗Transformers	5	2564	January 20, 2021
Metadata in batches Beginners	0	12	January 30, 2025
DataCollator for selecting a random subset and permutation Beginners	0	589	July 20, 2023
Pretrained Models for Inferencing not using gpu Models	0	346	August 7, 2023
Train T5/BART to convert a string into multiple strings 🤗Transformers	1	1676	December 10, 2022

How to custom batch sentences and graphs

Related topics