Running out of Memory with run_clm.py

yubi-sanprit · December 14, 2022, 5:33am

@lewtun
In answer to my question on big data size and lazy loading:
Transformers dataset dict format and its map method to call any function like tokenisation and grouping is designed to run in batches.It will handle any big data with batch run. So, work with any size big data use convert your dataset in Transformers dataset dict format and map method

Topic		Replies	Views
How to deal with tokenizer out of memory in run_clm.py Beginners	0	302	March 22, 2023
Repeated training runs out of GPU memory 🤗Transformers	3	272	December 16, 2024
Run_clm.py: why does the tokenizer phase use so much memory? 288GB for <2GB input data Beginners	1	329	February 4, 2024
Do the data-load pipeline have the memory limit? I have not test it, Thank you! Beginners	0	278	May 24, 2023
Missmatch between memory-estimate and Trainer-API Beginners	0	183	January 23, 2024

Running out of Memory with run_clm.py

Related topics