Looking for tool class to do predictions

PhilipMay · October 7, 2020, 9:45am

Hi,
I am looking for a tool class to do predictions with BERT models. Like the Trainer class just for prediction time. There is much more to do than just doing logits = model(input_ids). When you have large input lists of text you have to do this in batches or you get an OOM from your GPU and there is some more advanced stuff like smart batching (sorting the sentences by token length and batch short sentences with other short sentences to have short batches with minimal padding).

Is there something like this?

Thanks
Philip

stefan-it · October 7, 2020, 9:53am

Hi Philip,

maybe the HF Pipelines implementation could help:

github.com

huggingface/transformers/blob/master/notebooks/03-pipelines.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.6"

This file has been truncated. show original

It should also support e.g. multiple sentences (List[str]) as inputs and you could use of course own trained and fine-tuned models. Detailed implementation can be found here:

github.com

huggingface/transformers/blob/master/src/transformers/pipelines.py

# coding=utf-8
# Copyright 2018 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import csv
import json
import os
import pickle

This file has been truncated. show original

To use it for text classification/sentiment analysis:


nlp_sentence_classif = pipeline('sentiment-analysis')
nlp_sentence_classif('Such a nice weather outside !')

and it also outputs the score:

[{'label': 'POSITIVE', 'score': 0.9997656}]

I like using Pipelines especially for testing NER models

PhilipMay · October 7, 2020, 10:48am

I bet!

Thanks for the answer.

PhilipMay · October 9, 2020, 12:22pm

@stefan-it although I see no support for batching. At least not in the TextClassificationPipeline.

When I follow the naive approach:

model = AutoModelForSequenceClassification.from_pretrained(model_dir)
tokenizer = AutoTokenizer.from_pretrained(model_dir)
pipeline = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    device=0,
)
pipeline(unlabeled_text_list)

I am getting an CUDA OOM if the list (unlabeled_text_list) is long…

Topic		Replies	Views
Huggingface classification struggling with prediction 🤗Transformers	0	833	April 5, 2022
Using Trainer at inference time 🤗Transformers	9	15901	May 4, 2023
I have trained my classifier, now how do I do predictions? Beginners	7	41083	February 14, 2021
New pipeline for zero-shot text classification 🤗Transformers	107	71723	February 17, 2025
How to make single-input inference faster? Create my own pipeline? 🤗Transformers	9	3948	August 26, 2021

Looking for tool class to do predictions

Related topics