Access Tokenizer from Sagemaker BART Endpoint

Is there a way to access the tokenizer used in the HuggingFacePredictor endpoint?

I’d like to create a list of nested sentences using the same tokenizer I will use in a downstream task. How would I go about replacing nli_tokenizer() in the create_nested_sentences() function below given that I ran the following in my Sagemaker Notebook instance?

Any advice would be greatly appreciated!

Sagemaker Endpoint:

from sagemaker.huggingface import HuggingFaceModel

hub = {
  'HF_MODEL_ID':'facebook/bart-large-mnli',
  'HF_TASK':'zero-shot-classification'
}

bart = HuggingFaceModel(
    transformers_version='4.6',
    pytorch_version='1.7',
    py_version='py36',
    env=hub,
    role=role,
)

predictor = bart.deploy(initial_instance_count=1,instance_type="ml.m5.xlarge")

Excerpt I’d like to replicate in Sagemaker
The code below currently works in Google Colab and I’d like to replicate it using my endpoint but I am not sure how to access the nli_tokenizer() from the predictor() above.

!pip install transformers

from transformers import  AutoTokenizer
nli_tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')

import spacy
nlp = spacy.load('en_core_web_sm')

def create_nested_sentences(document:str, token_max_length = 1024):
  # Reference: https://discuss.huggingface.co/t/summarization-on-long-documents/920/7
    nested = []
    sent = []
    length = 0
    # Break up text into sentences
    tokens = nlp(document)

    for sentence in tokens.sents:
        # Use the same tokenizer as a downstream inference
        tokens_in_sentence = nli_tokenizer(str(sentence), truncation=True, padding=False)[0] 
        length += len(tokens_in_sentence)
        if length < token_max_length:
          sent.append(sentence)
        else:
          nested.append(sent)
          sent = []
          length = 0

    if sent:
        nested.append(sent)
    return nested

document = '''
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. 
During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930.
It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). 
Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
'''
create_nested_sentences(document, token_max_length = 100)

Hi @pleonova , do I understand correctly that you want to use your logic for nested sentences inside the Sagemaker endpoint?

Hi @marshmellow77, I originally had intentions of using the nested sentences outside of the endpoint as a prep step for my long text. However, I guess I could also use it inside a custom function inside the predict_fn(). I was hoping to use the HuggingFace Sagemaker toolkit implementation off the shelf.

My ideal order of operation:

  1. Convert long text into nested sentences based on the token length (create tokens using AutoTokenizer.from_pretrained('facebook/bart-large-mnli'))
  2. Feed those nested sentences into a conditional generation model (BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn"))
  3. Feed the summary results from into a sequence classification model (AutoModelForSequenceClassification.from_pretrained(nli_model_name))

^ I’d like to do the above using Sagemaker Endpoints. I currently have this working on my hugging face app here.

I see. In that case t seems to me you can override the relevant methods in the SageMaker Hugging Face Inference Toolkit:

  • You can load both models in the model_fn() method
  • You can override input_fn() to pre-process the data
  • You can chain the conditional generation model and the sequence classification model in the predict_fn() method

See this documentation and this example. Hope this helps.

Cheers
Heiko

Thank you Heiko! I wasn’t sure if I was missing anything in terms of being able to access the tokenizer from the endpoint without chaining the models in the custom predict_fn().