Hi,
I just finetuned Tiny-Llama as tiny-sajar, a little experiment to test finetuning. Running the following code in google colab:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Replace with your model's path on the Hub
model = AutoModelForCausalLM.from_pretrained("Dagriffpatchfan/tiny-sajar")
tokenizer = AutoTokenizer.from_pretrained("Dagriffpatchfan/tiny-sajar")
Worked perfectly, loading the model. I was then able to run the following code:
questions = [
"Questions here",
]
for question in questions:
prompt = f"{question}"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_length=100, # Maximum number of tokens to generate
num_return_sequences=1, # Number of separate completions to generate
temperature=0.7, # Sampling temperature (lower is more focused, higher is more random)
top_p=0.9, # Nucleus sampling
do_sample=True # Enable sampling
)
# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"**{question}**\n{generated_text}\n")
Which generated text as expected. I went to try this in a jupyterlab space and to my complete surprise I got the following error when I tried to load the model:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[7], line 4 1 from transformers import AutoModelForCausalLM, AutoTokenizer 3 # Replace with your model’s path on the Hub ----> 4 model = AutoModelForCausalLM.from_pretrained(“Dagriffpatchfan/tiny-sajar”) 5 tokenizer = AutoTokenizer.from_pretrained(“Dagriffpatchfan/tiny-sajar”) 7 questions = [ 8 “Who are you, and what is your role in the story?”, 9 “How did you come to know David and the Avengers?”, (…) 17 “If you had to pick one person to go on a mission with, who would it be and why?” 18 ] File ~/miniconda/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:531, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 528 if kwargs.get(“quantization_config”, None) is not None: 529 _ = kwargs.pop(“quantization_config”) → 531 config, kwargs = AutoConfig.from_pretrained( 532 pretrained_model_name_or_path, 533 return_unused_kwargs=True, 534 trust_remote_code=trust_remote_code, 535 code_revision=code_revision, 536 _commit_hash=commit_hash, 537 **hub_kwargs, 538 **kwargs, 539 ) 541 # if torch_dtype=auto was passed here, ensure to pass it on 542 if kwargs_orig.get(“torch_dtype”, None) == “auto”: File ~/miniconda/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py:1151, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 1148 if pattern in str(pretrained_model_name_or_path): 1149 return CONFIG_MAPPING[pattern].from_dict(config_dict, **unused_kwargs) → 1151 raise ValueError( 1152 f"Unrecognized model in {pretrained_model_name_or_path}. " 1153 f"Should have a model_type
key in its {CONFIG_NAME}, or contain one of the following strings " 1154 f"in its name: {', '.join(CONFIG_MAPPING.keys())}" 1155 ) ValueError: Unrecognized model in Dagriffpatchfan/tiny-sajar. Should have a model_type
key in its config.json, or contain one of the following strings in its name: albert, align, altclip, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, colpali, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v3, deformable_detr, deit, depth_anything, depth_pro, deta, detr, diffllama, dinat, dinov2, dinov2_with_registers, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, emu3, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, git, glm, glm4, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hiera, hubert, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mistral3, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_vl, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen3, qwen3_moe, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam_vision_model, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip_vision_model, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, textnet, time_series_transformer, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zamba2, zoedepth
I found this very confusing…does anyone know what I am experiencing?