Trainer RuntimeError: The size of tensor a (462) must match the size of tensor b (448) at non-singleton dimension 1
|
|
16
|
30025
|
April 11, 2024
|
How to properly UPCAST the model weights to float32?
|
|
2
|
84
|
April 11, 2024
|
Kosmos-2 Fine tuning
|
|
35
|
711
|
April 11, 2024
|
Shouldn't RobertaForCausalLM generate something?
|
|
8
|
1053
|
April 11, 2024
|
How many GB of RAM do I need to train DBRX?
|
|
2
|
89
|
April 11, 2024
|
Tensor size error when generating embeddings for documents using pre-trained models
|
|
3
|
96
|
April 11, 2024
|
Search models by tokenizer
|
|
0
|
41
|
April 10, 2024
|
Fine-Tune LoRA adapter starting from existing adapter
|
|
1
|
157
|
April 10, 2024
|
NotImplementedError: Cannot copy out of meta tensor; no data!
|
|
2
|
5043
|
April 10, 2024
|
Seeking Clarification: Model Evaluation - Train and Val loss
|
|
3
|
84
|
April 10, 2024
|
Development status of huggingface/tflite-android-transformers and modern alternatives
|
|
0
|
70
|
April 10, 2024
|
Is LLaMA rotary embedding implementation correct?
|
|
5
|
2908
|
April 10, 2024
|
Exporting UDOP to ONNX fails
|
|
0
|
86
|
April 8, 2024
|
IndexError: index out of range in self while training a language model from scratch
|
|
0
|
67
|
April 9, 2024
|
Processing the [-100] Mask in SFT
|
|
2
|
74
|
April 9, 2024
|
Server-side Audio Processing in Node.js
|
|
0
|
53
|
April 8, 2024
|
Torchrun uses more vram than running the script with python directly
|
|
0
|
69
|
April 8, 2024
|
New pipeline for zero-shot text classification
|
|
105
|
67003
|
April 8, 2024
|
Subject: Issues with Custom Model Saving Behavior Using Trainer Class in LVLM Training
|
|
0
|
49
|
April 8, 2024
|
Speeding up the inference for marian MT
|
|
4
|
2117
|
April 8, 2024
|
Loading only pre-trained backbone for Mask2Former
|
|
0
|
64
|
April 8, 2024
|
Hardware Requirements | Fine tuning Pegasus Large
|
|
1
|
829
|
April 8, 2024
|
Slower train with collator for completion only
|
|
1
|
566
|
April 7, 2024
|
Error Loading Custom Transformers.js model from hugging face hub
|
|
0
|
90
|
April 7, 2024
|
Mistral trouble when fine-tuning : Don't set pad_token_id = eos_token_id
|
|
5
|
590
|
April 7, 2024
|
How does Gemini 1.5 achieve 10M context window?
|
|
0
|
88
|
April 7, 2024
|
How to run hf MoE series model in an expert parallel manner?
|
|
0
|
65
|
April 7, 2024
|
Flash attention has no effect on inference
|
|
5
|
2417
|
April 6, 2024
|
What should I do if I want to use model from DeepSpeed
|
|
5
|
1379
|
April 6, 2024
|
Setting up separate device for validation in Trainer?
|
|
0
|
45
|
April 6, 2024
|