Hi,
I’m experimenting with building a custom LLM architecture that isn’t currently supported in transformers. I’d like to know the recommended way to integrate it so I can use AutoModelForCausalLM-style loading and push it to the Hub for inference.
Specifically, I’m wondering:
What are the minimal steps/files required to register a new model architecture with transformers?
Do I need to fork the library and add it under src/transformers/models/ with config, modeling, and tokenizer files, or is there a lighter way (e.g. transformers custom code integration)?
How does one publish such a model to the Hub so that from_pretrained can discover and load it automatically?
Are there any best practices / examples for adding new architectures (especially LLMs)?
Thanks in advance for the guidance!