Using fine tuned model for inference

Hello guys,

I am very new to this so I apologise if this is a dumb question, but for some reason, it seems to be pretty hard for me to just use an already trained transformer model to make inference.

So basically, what I am trying to do is to apply ClimateBERT to a dataset which I got from another author (ECOLEX_Legislation.csv).

I have encountered quite a few issues in the process of doing this, but managed to somewhat solve them (I think). However, as of now, after executing the code below, it has been running for a few hours. I am not too sure why. So I would appreciate it if someone could help me with this.

from transformers import pipeline, AutoTokenizer
from datasets import load_dataset
from transformers.pipelines.pt_utils import KeyDataset
df=load_dataset("csv",data_files="ECOLEX_Legislation.csv", delimiter=",", split="train")
for out in pipe(KeyDataset(df, "Policy_Content"), truncation=True, max_length=512):
    print(out)

Note that, I think that the model works fine if I used pipe it over one sentence. However, I think it becomes a problem when I pipe it over an entire dataset. Secondary to this, I was wondering if there are any complete tutorials that I could refer to (from loading dataset to analysing and visualising the data).

Thank you so much for reading this and if you guys need more information, I am happy to elaborate on it.

Warm regards,
Yanith

Update: I am now trying to run this code (just for the sake for trying something new). It has been 30 minutes already and it is running so I am not sure what is wrong. Also, even though I used tqdm, it did not show any progress bar.

from tqdm.auto import tqdm
from transformers import pipeline, AutoTokenizer
from datasets import load_dataset
from transformers.pipelines.pt_utils import KeyDataset
df=load_dataset("csv",data_files="ECOLEX_Legislation.csv", delimiter=",", split="train")
#Remove missing values 
missing_values = df.filter(lambda example: example['Policy_Content'] is None)
print("Number of missing values:", len(missing_values))
# Define a function that checks if the data in your column of interest is not missing
def is_not_missing(example):
    return example['Policy_Content'] is not None

# Apply this function to filter out rows with missing values
filtered_dataset = df.filter(is_not_missing)
pipe = pipeline("text-classification", model="climatebert/distilroberta-base-climate-sentiment")
tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-sentiment")

tokenizedfiltered_dataset = tokenizer(filtered_dataset["Policy_Content"], truncation=True, max_length=512)
for out in tqdm(pipe(KeyDataset(filtered_dataset, "Policy_Content"), truncation=True, max_length=512), total=len(filtered_dataset)):
    print(out)