Hi All,
I am new to transformers and I am trying to solve a text classification problem. I am using the transformers library to import a pre trained transformer to sagemaker and fine tune it on my dataset following the steps in this link .(notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub).
Now that my model data is saved at an S3 location, I want to use it at inference time. I am using below code to create a HuggingFaceModel object to read in my model data and run prediction by deploying it at an endpoint.
from sagemaker.huggingface.model import HuggingFaceModel
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data="s3://models/my-bert-model/model.tar.gz", # path to your trained SageMaker model
role=role, # IAM role with permissions to create an endpoint
transformers_version="4.6", # Transformers version used
pytorch_version="1.7", # PyTorch version used
py_version='py36', # Python version used
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge"
)
# example request: you always need to define "inputs"
data = {
"inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days."
}
# request
predictor.predict(data)
However, I am not sure how can I use the “predictor” or the HuggingFaceModel object to get the following things at inference time -
-
Class probabilities - predictor.predict() gives me the final class label and a score whereas I want to see the class probabilities/logits. How can I get them?
-
The hidden states and layers of the fine tuned model - If I read a model directly from hub , I could use the below code to get logits, hidden layer etc using below code -
‘’’
model = AutoModelForSequnceClassification.from_pretrained(“microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext”)
tokenizer = AutoTokenizer.from_pretrained(“microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext”)
inputs = tokenizer(“test string”, return_tensors=“pt”)
labels = torch.tensor([1]).unsqueeze(0)
outputs = model(**inputs, labels=labels,output_hidden_states=True, output_attentions=True)
output[0] will give loss
output[1] will give logits and so on…
‘’’
but this is not applicable when I am reading my fine tuned model in the HuggingFaceModel class object. -
At inference time , how do I truncate my text length to ensure there are only 512 tokens passed to the model, since I cannot use the same tokenizer at training time that was used at inference. I tried using setting the truncation parameter true as below(suggested here How are the inputs tokenized when model deployment? - #12 by philschmid) but it does not work for me -
‘’’
long_sentence = “…” # longer than 512 tokens
sentiment_input= {
{‘inputs’:long_sentence,
‘parameters’: {‘truncation’:True}
}
predictor.predict(sentiment_input)
‘’’
Looking forward to your replies. Thank you.