Model loading in Colab but not Jupyterlab?!

Hi,
I just finetuned Tiny-Llama as tiny-sajar, a little experiment to test finetuning. Running the following code in google colab:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Replace with your model's path on the Hub
model = AutoModelForCausalLM.from_pretrained("Dagriffpatchfan/tiny-sajar")
tokenizer = AutoTokenizer.from_pretrained("Dagriffpatchfan/tiny-sajar")

Worked perfectly, loading the model. I was then able to run the following code:

questions = [
    "Questions here",
]

for question in questions:
    prompt = f"{question}"
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        inputs.input_ids,
        max_length=100,         # Maximum number of tokens to generate
        num_return_sequences=1, # Number of separate completions to generate
        temperature=0.7,        # Sampling temperature (lower is more focused, higher is more random)
        top_p=0.9,              # Nucleus sampling
        do_sample=True          # Enable sampling
    )

    # Decode the generated text
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"**{question}**\n{generated_text}\n")

Which generated text as expected. I went to try this in a jupyterlab space and to my complete surprise I got the following error when I tried to load the model:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[7], line 4 1 from transformers import AutoModelForCausalLM, AutoTokenizer 3 # Replace with your model’s path on the Hub ----> 4 model = AutoModelForCausalLM.from_pretrained(“Dagriffpatchfan/tiny-sajar”) 5 tokenizer = AutoTokenizer.from_pretrained(“Dagriffpatchfan/tiny-sajar”) 7 questions = [ 8 “Who are you, and what is your role in the story?”, 9 “How did you come to know David and the Avengers?”, (…) 17 “If you had to pick one person to go on a mission with, who would it be and why?” 18 ] File ~/miniconda/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:531, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 528 if kwargs.get(“quantization_config”, None) is not None: 529 _ = kwargs.pop(“quantization_config”) → 531 config, kwargs = AutoConfig.from_pretrained( 532 pretrained_model_name_or_path, 533 return_unused_kwargs=True, 534 trust_remote_code=trust_remote_code, 535 code_revision=code_revision, 536 _commit_hash=commit_hash, 537 **hub_kwargs, 538 **kwargs, 539 ) 541 # if torch_dtype=auto was passed here, ensure to pass it on 542 if kwargs_orig.get(“torch_dtype”, None) == “auto”: File ~/miniconda/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py:1151, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 1148 if pattern in str(pretrained_model_name_or_path): 1149 return CONFIG_MAPPING[pattern].from_dict(config_dict, **unused_kwargs) → 1151 raise ValueError( 1152 f"Unrecognized model in {pretrained_model_name_or_path}. " 1153 f"Should have a model_type key in its {CONFIG_NAME}, or contain one of the following strings " 1154 f"in its name: {', '.join(CONFIG_MAPPING.keys())}" 1155 ) ValueError: Unrecognized model in Dagriffpatchfan/tiny-sajar. Should have a model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, colpali, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v3, deformable_detr, deit, depth_anything, depth_pro, deta, detr, diffllama, dinat, dinov2, dinov2_with_registers, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, emu3, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, git, glm, glm4, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hiera, hubert, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mistral3, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_vl, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen3, qwen3_moe, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam_vision_model, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip_vision_model, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, textnet, time_series_transformer, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zamba2, zoedepth

I found this very confusing…does anyone know what I am experiencing?

1 Like

Since it includes models close to the latest ones such as Gemma 3, the Transoformers version is likely to be almost the latest. In fact, even older Transoformers models should work with the Llama architecture. This is indeed a strange error. The cause is probably not the code or the model itself.

There seems to be a possibility of errors occurring in hf_transfer related to Jupyter. In other words, there may be an error in the download.