Progress bar for HF pipelines

Hello everyone,

Is there a way to attach progress bars to HF pipelines? For example, in summarization pipeline I often pass a dozen of texts and would love to indicate to user how many texts have been summarized so far.

TIA,
Vladimir

Hello Vladimir :wave:

I saw this feature request where @Narsil says if you make your examples into a Hugging Face Dataset you can see the progress, like below:

dataset = MyDataset()

for out in tqdm.tqdm(pipe(dataset)):
    print(out)

class ListDataset(Dataset):
     def __init__(self, original_list)
        self.original_list = original_list

    def __len__(self):
        return len(self.original_list)

    def __getitem__(self, i):
        return self.original_list[i]

I don’t know of a way to do this without something like tqdm. (note that it adds extra complexity on top of your inference) below is my code.

from tqdm import tqdm
from transformers import pipeline

generator = pipeline(task="text-generation")
examples = [
        "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone",
        "Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne",
    ]
 
for i in tqdm(range(len(examples))):
    generator(examples)

Maybe @osanseviero knows a better way of doing this.

1 Like

Hey @merve , thanks a bunch. Here is a small example Colab notebook

I couldn’t get this to work … could it be because my pipeline has a tokenizer built-in?

tokenizer = partial(AutoTokenizer.from_pretrained("results_820/checkpoint-10000/"), truncation=True)

def preprocess_data(data):
    encoding = tokenizer(data['text'], truncation=True)
    return encoding

model = AutoModelForSequenceClassification.from_pretrained("results_820/checkpoint-10000/", num_labels=2)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)

validation_df = pd.read_csv("validation_set.csv")
validation_dataset = Dataset.from_pandas(validation_df)

for out in tqdm.tqdm(pipe(validation_dataset["text"])):
    print(out)

@afriedman412, you need to wrap your data into the torch dataset, not the huggingface dataset.

@vblagoje @afriedman412 I’m stuck in the same problem. I have a hugging face dataset where text example that I want to predict on has an id. (i.e. dataset[‘test’][index][‘text’]). I got this from a pandas dataframe. Could you guide me with an example of how to get a torch dataset? Thank you!

One workaround which I used:

results = []
CHUNK_SIZE = 100
for chunk in tqdm(range(test_df.shape[0] // CHUNK_SIZE + 1)):
    descr = test_df[CHUNK_SIZE * chunk: (CHUNK_SIZE+1) * chunk]['description'].to_list()
    res = nlp(descr)
    results += res