AutoModelForSeq2SeqLM.from_pretrained('facebook/nllb-200-1.3B') loads M2M100

bnich · September 16, 2024, 3:57pm

When I run this code

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")

where transformers version == 4.44.2

then I print the model object, I get

M2M100ForConditionalGeneration(
  (model): M2M100Model(
    (shared): M2M100ScaledWordEmbedding(256206, 1024, padding_idx=1)
    (encoder): M2M100Encoder(
      (embed_tokens): M2M100ScaledWordEmbedding(256206, 1024, padding_idx=1)
      (embed_positions): M2M100SinusoidalPositionalEmbedding()
      (layers): ModuleList(
        (0-11): 12 x M2M100EncoderLayer(
          (self_attn): M2M100Attention(
            (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
.........

Any reason why my object is returning the M2M100 model instead of the NLLB model? I tried this inside a Jupyter notebook and my local env, and I get the same result.

Topic		Replies	Views
Torch.jit.trace for facebook/m2m100_418M Beginners	0	181	August 13, 2023
Implementation source code for AutoModelForSeq2SeqLM Beginners	0	977	January 5, 2022
Loading pre-trained models with AddedTokens 🤗Transformers	2	749	October 14, 2024
Too strange translation result in NLLB-200-3.3B Models	0	445	September 13, 2023
AutoModel.from_pretrained(model_name) doesn't have lm_head why? 🤗Transformers	0	72	June 23, 2024

AutoModelForSeq2SeqLM.from_pretrained('facebook/nllb-200-1.3B') loads M2M100

Related topics