Code completion models

Hi,

We’re looking for models suitable for autocompletion, which can do a next line prediction. Currently, our main interest lies in the CodeT5 and CodeBERT models, however, it seems from the surface they only do masked token prediction. However, a banner gif on the CodeT5 github suggests otherwise: we can see it complete a whole block of code, but unfortunately we couldn’t find any example code on how to achieve this.

On the discord, a suggestion was made to use facebook/incoder-1B, which perfectly suits our needs and does good auto completions. However, it seems the inference time of this model is quite large: on some empirical testing the completions took ~3s, and ~6-10s on CPU.

Are there any other models suitable for autocompletion (next line token prediction) which are able to display results in a reasonable time frame (maybe even on CPU?)