I am currently fine-tuning a language model using a policy-gradient reinforcement learning technique. Instead of a standard loss function, I am using a reward function and the REINFORCE algorithm to teach the model to emulate some desired behaviour.
As part of the reward function, I compute the ROUGE score between a reference sentence and a generated one. To do this I’m currently using a list comprehension that zips together two lists of sentences (ref_batch
and pred_batch
below) and then calculates the rouge score for each.
The code looks something like this:
from datasets import load_metric
rouge_metric = load_metric("rouge")
def get_rouge_score(ref, pred):
return rouge_metric.compute(rouge_types=["rougeL"], predictions=[pred], references=[ref])['rougeL'].mid.fmeasure
rouge_scores = torch.tensor([get_rouge_score(ref,pred) for ref,pred in zip(ref_batch, pred_batch)], device=device)
The problem with this is that it is very slow. The list comprehension iterates through examples one by one and uses the CPU to do the operation. By contrast, the rest of the training loop runs using tensor cores on a GPU. Hence this step is a significant bottleneck; profiling on the training step shows that this step alone takes up ~60% of the training time.
So my question: how could I parallelize this step, or even make it faster another way? If there’s a way to calculate scores on the GPU, that would be awesome. If not, is there an easy way I can use multiple CPU cores for this?
I’m also able to change the metric from ROUGE to another one that is more able to be parallelized, if that helps.
Thanks