Using facebook/incoder-1B on Inference API

whitead · May 7, 2022, 2:57am

I’m trying to use the facebook/incoder-1B model via the inference API. It has some interesting masking procedures and I’m unable to replicate the results from their codespace demo/python. I wonder if there is some tokenization magic happening that I cannot discern in the inference API.

This prompt was constructed following the paper and python implementation

print("Hello W<|mask:0|>!")<|mask:0|>

and I am returned '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'. The correct output should look like: orld<|endofmask|>. I wonder if there is a tokenizer problem here? Any help would be great, thank you!

Here is the exact code used for inference:

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = f"https://api-inference.huggingface.co/models/facebook/incoder-1B"
data = dict(inputs='print("Hello W<|mask:0|>!")<|mask:0|>', 
            options=dict(use_gpu=False, use_cache=False, wait_for_model=True),
            parameters=dict(return_full_text = False, temperature=0.5, top_p=0.95, do_sample=True))
web_response = requests.request(
    "POST", API_URL, headers=headers, data=json.dumps(data))
response = json.loads(web_response.content.decode("utf-8"))
print(response)

dpfried · May 7, 2022, 6:02pm

This seems to be because the inference API uses a text-generation pipeline, which doesn’t seem to add the special <|endoftext|> token that should be at the start of every prompt (transformers/text_generation.py at 215e0681e4c3f6ade6e219d022a5e640b42fcb76 · huggingface/transformers · GitHub). [while the tokenizer we’re using will add them by default, unless this add_special_tokens flag is set to false].

I’m not sure whether that’s intended behavior for the pipeline or not, but as a workaround you should be able to pass prefix='<|endoftext|>' in parameters, e.g.

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = f"https://api-inference.huggingface.co/models/facebook/incoder-1B"
data = dict(inputs='print("Hello W<|mask:0|>!")<|mask:0|>', 
            options=dict(use_gpu=False, use_cache=False, wait_for_model=True),
            parameters=dict(return_full_text = False, temperature=0.5, top_p=0.95, do_sample=True, prefix="<|endoftext|>"))
web_response = requests.request(
    "POST", API_URL, headers=headers, data=json.dumps(data))
response = json.loads(web_response.content.decode("utf-8"))
print(response)

I should also say that this example will produce some unexpected text like “Hello Who Are You” rather than “Hello World” because “Hello World” is probably an entire complete token in our vocabulary (since we trained the tokenizer to allow multi-word tokens, and this is probably a common-enough phrase that the tokenizer learns a single token for it), so the token ids for “Hello W” are likely not a prefix of the token ids for “Hello World”). You’ll likely get the best results for infills when trying to infill entire lines, or multiple lines (as we didn’t allow token merges across newlines).

Topic		Replies	Views
How to use the Rostlab/prot_bert fill-mask pipeline 🤗Transformers	1	566	October 22, 2020
Bug Report: Mask token mismatch with the model on hosted inference API of Model Hub Beginners	1	551	May 31, 2021
Running blenderbot-3B locally does not produce same results as with inference API Beginners	2	443	March 28, 2022
Deploying to Model Hub for Inference with custom tokenizer Beginners	1	623	January 1, 2022
Hosted Inference API Beginners	0	843	March 6, 2023

Using facebook/incoder-1B on Inference API

Related topics