How To add custom LLM architecture to transformersitectute

Hi,

I’m experimenting with building a custom LLM architecture that isn’t currently supported in transformers. I’d like to know the recommended way to integrate it so I can use AutoModelForCausalLM-style loading and push it to the Hub for inference.

Specifically, I’m wondering:

What are the minimal steps/files required to register a new model architecture with transformers?

Do I need to fork the library and add it under src/transformers/models/ with config, modeling, and tokenizer files, or is there a lighter way (e.g. transformers custom code integration)?

How does one publish such a model to the Hub so that from_pretrained can discover and load it automatically?

Are there any best practices / examples for adding new architectures (especially LLMs)?

Thanks in advance for the guidance!

1 Like

Transoformers is designed to allow models to be self-contained within a single .py file whenever possible, so I think it’s best to start by actually creating the model class. At this point, you can call it locally or with trust_remote_code=True.

Next, you can register it locally with AutoClass (AutoModel***, AutoTokenizer***, etc.).

Even if you want to commit to GitHub, forking the repository itself can wait until much later.