Inquiry for adding a new layer for transformer model

TimelikeInfinity · August 18, 2023, 6:30pm

Hi there,

I am new to the transformer.
When I am fine-tuning for bloom-560m for phishing emails, I try to give the whole email for one label:

def tokenizeInputs(inputs):
tokenized_inputs = tokenizer(inputs[“email”], max_length = 512, truncation=True)
word_ids = tokenized_inputs.word_ids()
label = inputs[“label”]
labels = label# phishing or not
tokenized_inputs[“labels”] = [labels]
return tokenized_inputs

so it should have the whole email with one label right?

but after training when I try to get the output:

inputs = tokenizer(
#“HuggingFace is a company based in Paris and New York”,
‘Thank you Katie.\nI will be with David as well.\n’,
add_special_tokens=False, return_tensors=“pt”
)
#inputs = tokenizer(example[“email”])
with torch.no_grad():
logits = model_tuned(**inputs).logits
print(logits)
predicted_token_class_ids = logits.argmax(-1)
print(predicted_token_class_ids[0])
# Note that tokens are classified rather then input words which means that
# there might be more predicted token classes than words.
# Multiple token classes might account for the same word
predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
predicted_tokens_classes

result is like:
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I got the result for each word but not the whole email.
I have tried to search the topic and found few helpful.

Could you guys advise me on this? Thanks.

Topic		Replies	Views
Predicting with Token Classifier on data with no gold labels Beginners	1	1432	August 20, 2021
Transformers v3.0.0 is out! 🤗Transformers	0	1937	July 7, 2020
Inputs.word_ids() length not matching word label length 🤗Tokenizers	3	530	March 22, 2024
T5 models: About the decoder_input_ids argument Models	0	758	December 5, 2022
Unexpected result from transformer model prediction Beginners	0	288	November 21, 2021

Inquiry for adding a new layer for transformer model

Related topics