ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable
|
|
1
|
318
|
May 20, 2024
|
"No token was detected" when using Hosted inference API
|
|
3
|
751
|
May 20, 2024
|
Fine-tuning BERT for vulnerability detection with data sharing the same label
|
|
0
|
93
|
May 17, 2024
|
TypeError: MistralModel.__init__() got an unexpected keyword argument 'safe_serialization'
|
|
0
|
360
|
May 17, 2024
|
Training Longformer works on jupyter notebook but not with .py
|
|
0
|
88
|
May 17, 2024
|
Mixtral-8x7B trained with `--load_in_4bit`, showed as Tensor type F32
|
|
3
|
151
|
May 17, 2024
|
Which is actually used to configure scheduler in deepspeed and TrainingArguments?
|
|
0
|
92
|
May 17, 2024
|
How to control the GPU id for loading model weights when fintune Llama8B model with the Trainer?
|
|
0
|
76
|
May 17, 2024
|
Cannot import name 'WhisperForAudioClassification
|
|
0
|
139
|
May 16, 2024
|
Isn't KV cache influenced by position encoding in inference?
|
|
3
|
850
|
May 16, 2024
|
ModuleNotFoundError: No module named 'transformers.agents'
|
|
2
|
664
|
May 16, 2024
|
Why follow Flan-T5 template when T5 tokenizer ignores multiple newlines
|
|
0
|
112
|
May 15, 2024
|
Decoder only model - how to have it not include the prompt in its output?
|
|
3
|
581
|
May 15, 2024
|
ValueError: Unrecognized configuration class <class 'transformers.models.whisper.configuration_whisper.WhisperConfig'>
|
|
0
|
238
|
May 15, 2024
|
KeyError: 'eval_qwk' when used get_peft_model
|
|
0
|
112
|
May 14, 2024
|
Fedrated Learning using trainer
|
|
0
|
72
|
May 14, 2024
|
Next sentence prediction on custom model
|
|
3
|
3382
|
May 14, 2024
|
Llama 3 tokenizer prints cryptic message
|
|
0
|
157
|
May 13, 2024
|
Whisper Inference RuntimeError: The expanded size of the tensor (3000) must match the existing size (3392) at non-singleton dimension 1. Target sizes: [80, 3000]. Tensor sizes: [80, 3392]
|
|
1
|
712
|
May 13, 2024
|
HUBERT Implementation with increased vocabulary size
|
|
0
|
85
|
May 13, 2024
|
Model Parralelism approach in Llama Code looks like very inefficient
|
|
0
|
95
|
May 13, 2024
|
ValueError when training on a multi GPU setup and DPO
|
|
0
|
238
|
May 13, 2024
|
Transformer shifting output question
|
|
1
|
337
|
May 13, 2024
|
Not able to add data_collator to Trainer
|
|
1
|
517
|
May 13, 2024
|
How can we automatically run the script with a token included in a script
|
|
0
|
82
|
May 13, 2024
|
Index Error: Target {} is out of bounds
|
|
0
|
263
|
May 13, 2024
|
Load_in_8bit vs. loading 8-bit quantized model
|
|
6
|
6385
|
May 13, 2024
|
Convert Conv1D to nn.Linear
|
|
2
|
922
|
May 12, 2024
|
SFTrainer doesn't show added column
|
|
0
|
101
|
May 12, 2024
|
How can I keep use of the base model version for inference after fine-tuning
|
|
1
|
93
|
May 12, 2024
|