Transformers fine-tune architecture/code structure

lucasval · September 28, 2021, 5:50pm

Hey everyone, I’m new to the forums but have been following HuggingFace and Transformers in the last years.

I’m starting my MsC thesis where I’ll be fine-tuning transformer models for text classification. However I’m not sure what is the best way to structure my code.

I plan to run multiple experiments modifying different parameters such as dataset preprocessing, language model head structure and so on. I’m wondering what’s the most efficient way to implement the fine-tuning experiments, maybe a single script with multiple arguments? Or wrap the fine-tuning code in a Python class and create different instances with different parameters?

Then there’s the question of where to run this code. I don’t have physical access to a large machine with GPU, best case I will be getting Google cloud credits for renting a machine instance with GPU/TPU. In this case how should I pack my code for quick access? Maybe upload it to a Github repo and then pull it for every experiment? Maybe create a Docker image to encapsulate everything?

These may be silly questions but I don’t have any close contacts with extensive state of the art NLP experience and many papers don’t go deep into the architecture of their code. Any help will be greatly appreciated!

Topic		Replies	Views
How to Optimize Fine-tuning in Hugging Face Transformers? Beginners	0	334	March 5, 2024
Can we parallelize transformers fine-tuning on a Hadoop cluster? 🤗Transformers	0	343	April 7, 2023
GPT-2 fine-tuning Beginners	0	1606	June 12, 2023
Fine-tuning: Under the hood Intermediate	0	422	July 11, 2023
Tutorial: Implementing Transformer from Scratch - A Step-by-Step Guide Show and Tell	5	4326	May 1, 2025

Transformers fine-tune architecture/code structure

Related topics