How to use TPU in Kaggle to accelerate sentiment analysis using a pre-trained FinBert model

Hi Guys,

I am currently trying to analyze the sentiment of earnings calls using a pre-trained FinBert Model (yiyanghkust/finbert-tone · Hugging Face).

Since I want to analyze more than 40,000 earnings calls, the computation of the sentiment scores just on my notebook would take more than 2 weeks. Because of that, I want to use the TPU provided by Kaggle to accelerate this process. But I have never done that before so I don`t really know how to do that and all the tutorials/ guides I could find where just dealing with how to use the TPU to train the model, but I just want to use the pre-trained model and apply that on the earnings calls without further training.

This is my code so far. Where do I need to adjust it that it actually takes advantage of the TPU provided by Kaggle:

First, I import the transformers

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline

Then to activate the TPU in Kaggle:

try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
except ValueError:
tpu = None
gpus = tf.config.experimental.list_logical_devices(“GPU”)
if tpu:
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.experimental.TPUStrategy(tpu,)
print('Running on TPU ', tpu.cluster_spec().as_dict()[‘worker’])
elif len(gpus) > 1:
strategy = tf.distribute.MirroredStrategy([gpu.name for gpu in gpus])
print('Running on multiple GPUs ', [gpu.name for gpu in gpus])
elif len(gpus) == 1:
strategy = tf.distribute.get_strategy()
print('Running on single GPU ', gpus[0].name)
else:
strategy = tf.distribute.get_strategy()
print(‘Running on CPU’)
print("Number of accelerators: ", strategy.num_replicas_in_sync)

Then I build the model:

finbert = BertForSequenceClassification.from_pretrained(‘yiyanghkust/finbert-tone’,num_labels=3)
tokenizer = BertTokenizer.from_pretrained(‘yiyanghkust/finbert-tone’)
nlp = pipeline(“sentiment-analysis”, model=finbert, tokenizer=tokenizer)

for i in range(0,len(clean_data)-1):
    print(i)
    # Get QandA Text
    temp = test_data.iloc[i,3]
    sentences = nltk.sent_tokenize(temp)
    results = nlp(sentences)
    filename = clean_data.iloc[i,0]
    
    # Count the number of positive, neutral and negative sentences in the call 
    j = 0
    positive = 0
    neutral = 0 
    negative = 0 
    for j in range (0,len(results)):
        label = results[j]["label"]
        if label == "Positive":
            positive = positive + 1
        elif label == "Neutral": 
            neutral = neutral + 1 
        else:
            negative = negative + 1  
     
    # Calculate the Sentiment Scores   
    per_pos_qanda = positive / len(results)
    per_neg_qanda = negative / len(results)
    net_score_qanda = per_pos_qanda - per_neg_qanda
    
    # save results in a DataFrame
    finbert_results.iloc[i,0] = filename
    finbert_results.iloc[i,7] = per_pos_qanda
    finbert_results.iloc[i,8] = per_neg_qanda
    finbert_results.iloc[i,9] = net_score_qanda


But if I run this code now in Kaggle with the accelerator TPU turned on it is not faster at all. So, where do I need to adjust the code to actually take advantage of the TPU? 

Many thanks in advance!

Is there nobody you can help out?
I am still not able to figure it out :frowning: