Using Trainer at inference time

Isabella · August 20, 2021, 2:52pm

Hello everyone,
I successfully fine-tuned a model for text classification. Now I would like to run my trained model to get labels for a large test dataset (around 20,000 texts).

So I had the idea to instantiate a Trainer with my model and use the trainer.predict() method on my data. This works fine, but I was wondering if it makes sense (and it’s efficient, advisable, & so on) to use a Trainer (which, of course, was meant to be used for training models) just for inference.

If not, what would be a better way to perform inference on a large dataset? I cannot just pass all data to model() as I get out of memory errors. I would need to explicitly batch my data, I guess (while Trainer takes care of that part implicitly)…

Thank you in advance for your thoughts on this!

nielsr · August 21, 2021, 8:40am

Normally, the Trainer saves your trained model in a directory. You can specify this with the output_dir argument when instantiating the TrainingArguments.

You can then instantiate your trained model using the .from_pretrained() method. Suppose that you have fine-tuned a BertForSequenceClassification model, then you can instantiate it as follows:

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained("path_to_the_directory")

You can then make batched predictions as follows:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("path_to_the_directory")

text = ["this is one sentence", "this is another sentence"]
encoding = tokenizer(text, return_tensors="pt")

# forward pass
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)

BramVanroy · August 22, 2021, 8:56pm

Considering efficiency, the Trainer should be perfectly fine. You may wish to handle some specific optimisations though. See this post: Faster and smaller quantized NLP with Hugging Face and ONNX Runtime | by Yufeng Li | Microsoft Azure | Medium

Isabella · August 23, 2021, 9:59am

Yep, this works fine as long as we have few sentences to process, but in my case, with about 20,000 of them, I soon run out of memory if I try to pass all sentence encodings to model() at once. I guess I could write a for loop around the forward pass to process one sentence at a time but it doesn’t look very performant. The “right” way, I guess, is to run inference on mid-sized batches, which is what Trainer.predict() does under the hoods - so I was being lazy and tried to make advantage of that, rather than writing the batching process myself

Isabella · August 23, 2021, 10:00am

Thank you for your pointer, it will surely come in handy when I move this model in production!

NoToken · August 23, 2021, 11:33am

For all the other lazy, could you share your boilerplate for training your model and then actually using it?

Isabella · August 24, 2021, 4:21pm

Well, in both cases you need to instantiate a Trainer, with slightly different arguments. Something like this, a bit simplified.

For training:

# training arguments for Trainer
training_args = TrainingArguments(
    output_dir = OUTPUT_DIR,
    do_train = True,
    do_eval = True,
    per_device_train_batch_size = BATCH_SIZE,
    learning_rate = 2e-5,
    num_train_epochs = 10,
    dataloader_drop_last = False
)

# init trainer (model is the model you want to fine-tune)
trainer = Trainer(
        model = model,
        args = training_args,
        train_dataset = train_dataset,
        eval_dataset = valid_dataset,
        compute_metrics = compute_metrics
    )

trainer.train()
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training
model_to_save.save_pretrained(OUTPUT_DIR)

For inference:

# loading the model you previously trained
model = AutoModelForSequenceClassification.from_pretrained(OUTPUT_DIR)

# arguments for Trainer
test_args = TrainingArguments(
    output_dir = OUTPUT_DIR,
    do_train = False,
    do_predict = True,
    per_device_eval_batch_size = BATCH_SIZE,   
    dataloader_drop_last = False    
)

# init trainer
trainer = Trainer(
              model = model, 
              args = test_args, 
              compute_metrics = compute_metrics)

test_results = trainer.predict(test_dataset)

Then, from test_results you can easily derive predicted labels and probabilities.
Of course you will need to set your own constants/parameters and there are many more training arguments that can be passed to Trainer, but the main ideas are there.

thigner1 · May 23, 2022, 4:44am

thank you for great info

test_args = TrainingArguments(
output_dir = OUTPUT_DIR,
do_train = False,
do_predict = True,
per_device_eval_batch_size = BATCH_SIZE,
dataloader_drop_last = False
)

there is no TestingArugments ?

Isabella · May 31, 2022, 7:38am

Last time I checked (indeed, quite a long time ago) there was no TestingArgument class - but the TrainingArgument one, with those parameters, acts in fact like that.

ndvb · May 4, 2023, 7:35pm

It would be nice if we could instantiate Trainer with no output_dir for inference… Or if the predict() function could be run without Trainer

Topic		Replies	Views
How do I use a fine-tuned Trainer model for inference correctly? 🤗Transformers	0	987	June 9, 2023
How to do inference with fined-tuned huggingface models? 🤗Transformers	3	822	February 4, 2024
Looking for tool class to do predictions 🤗Transformers	3	554	October 9, 2020
Batch size for trainer.predict() 🤗Transformers	4	6912	November 26, 2022
How to make single-input inference faster? Create my own pipeline? 🤗Transformers	9	3949	August 26, 2021

Using Trainer at inference time

Related topics