I’m currently using HuggingFace Accelerate to train a model and using sklearn.metrics.classification_report
to get the results. I noticed that the support values for certain classes differ depending on whether I’m using one process vs. multiple processes.
I asked ChatGPT (lol) whether this may be true and it turns out that if the data is unevenly distributed then this may be a problem. I’m wondering how true this is because my initial intuition was that even if there’s an uneven distribution the data should still be divided (even if not perfectly equally) and later gathered, therefore the support shouldn’t differ.
Please let me know if I’m thinking incorrectly. Thanks.
I’m actually not gathering as your suggested example. The way that my code is structured is that I refactored an existing project slightly so that I can use HF Accelerate without refactoring things too much.
The way that the code is currently gathering is by using torch.distributed.all_gather_object
. More specifically, I have an intermediate value that contains predictions and labels inside of the evaluation loop, then after inference I gather everything into a final array-like object in order to perform evaluation using scikit-learn:
import torch.distributed as dist
intermediate_value = {}
output = [None] * accelerator.num_processes
for step, batch in enumerate(valid_dataloader):
y_pred = model(batch)
intermediate_value.setdefault("preds", []).append(y_pred)
intermediate_value.setdefault("targets", []).append(batch["target"])
dist.all_gather_object(output, intermediate_value)
Is this approach not suggested? I would assume that this approach should work just fine but I’m wondering if there would be a difference between using HF’s approach.
You’re going to have extra items in here, so as always please definitely use the API otherwise you’ll need to drop the repeats etc
1 Like