Progress bar for HF pipelines

vblagoje · July 18, 2022, 8:01am

Hello everyone,

Is there a way to attach progress bars to HF pipelines? For example, in summarization pipeline I often pass a dozen of texts and would love to indicate to user how many texts have been summarized so far.

TIA,
Vladimir

merve · July 18, 2022, 3:57pm

Hello Vladimir

I saw this feature request where @Narsil says if you make your examples into a Hugging Face Dataset you can see the progress, like below:

dataset = MyDataset()

for out in tqdm.tqdm(pipe(dataset)):
    print(out)

class ListDataset(Dataset):
     def __init__(self, original_list)
        self.original_list = original_list

    def __len__(self):
        return len(self.original_list)

    def __getitem__(self, i):
        return self.original_list[i]

I don’t know of a way to do this without something like tqdm. (note that it adds extra complexity on top of your inference) below is my code.

from tqdm import tqdm
from transformers import pipeline

generator = pipeline(task="text-generation")
examples = [
        "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone",
        "Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne",
    ]
 
for i in tqdm(range(len(examples))):
    generator(examples)

Maybe @osanseviero knows a better way of doing this.

vblagoje · July 18, 2022, 8:37pm

Hey @merve , thanks a bunch. Here is a small example Colab notebook

afriedman412 · August 21, 2022, 1:10am

I couldn’t get this to work … could it be because my pipeline has a tokenizer built-in?

tokenizer = partial(AutoTokenizer.from_pretrained("results_820/checkpoint-10000/"), truncation=True)

def preprocess_data(data):
    encoding = tokenizer(data['text'], truncation=True)
    return encoding

model = AutoModelForSequenceClassification.from_pretrained("results_820/checkpoint-10000/", num_labels=2)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)

validation_df = pd.read_csv("validation_set.csv")
validation_dataset = Dataset.from_pandas(validation_df)

for out in tqdm.tqdm(pipe(validation_dataset["text"])):
    print(out)

vblagoje · August 24, 2022, 7:55am

@afriedman412, you need to wrap your data into the torch dataset, not the huggingface dataset.

BrunoSE · November 9, 2022, 9:54pm

@vblagoje @afriedman412 I’m stuck in the same problem. I have a hugging face dataset where text example that I want to predict on has an id. (i.e. dataset[‘test’][index][‘text’]). I got this from a pandas dataframe. Could you guide me with an example of how to get a torch dataset? Thank you!

maiia-bocharova · December 22, 2022, 11:03am

One workaround which I used:

results = []
CHUNK_SIZE = 100
for chunk in tqdm(range(test_df.shape[0] // CHUNK_SIZE + 1)):
    descr = test_df[CHUNK_SIZE * chunk: (CHUNK_SIZE+1) * chunk]['description'].to_list()
    res = nlp(descr)
    results += res

907Resident · June 3, 2023, 12:30am

Nice, this comment by @Maiia was very helpful. Thanks, this helped me see a 140% difference in my execution time for my code.

Also, adding device_map="auto" to the pipeline object ensures that the code will take advantage of whatever hardware config you may have. At least, my experience thus far

ddefranza · October 30, 2023, 1:53pm

This is very helpful and solved my problem getting a tqdm progress bar working with an existing pipeline as well. One note:

I think the calculation of the data range based on chunk and CHUNK_SIZE is off. It should look something more like:

descr = test_df[(CHUNK_SIZE * chunk) : (CHUNK_SIZE * chunk) + CHUNK_SIZE]['description'].to_list()

Either way, thanks again @maiia-bocharova for the excellent template.

marctorsoc · December 24, 2023, 9:58pm

It could really be

descr = test_df[(CHUNK_SIZE * chunk) : CHUNK_SIZE * (chunk + 1)]['description'].to_list()

The problem was factorizing chunk rather than CHUNK_SIZE. Btw, it still complaints about not using a Dataset. If someone finds a way to get progressbar less hacky than this, please post it

Topic		Replies	Views
Tokenizer progress bar 🤗Transformers	2	3734	August 6, 2023
PipelineIterator Issue 🤗Transformers	1	171	July 25, 2024
HF Trainer progress bar not progressing after first epoch 🤗Transformers	0	1969	May 10, 2023
Custom Pipeline 🤗Transformers	0	557	July 18, 2022
Not able to download Model not showing progress bar when downloading using from_pretrained 🤗Transformers	7	671	October 4, 2024

Progress bar for HF pipelines

Related topics