Torchrun uses more vram than running the script with python directly
|
|
1
|
363
|
May 27, 2024
|
How can you switch between adapters in the inference model?
|
|
2
|
406
|
May 27, 2024
|
Need Help Improving Similarity Scores for Follow-up Detection Using BERT or similar
|
|
1
|
113
|
May 26, 2024
|
Fine tuning t5 to write like me
|
|
0
|
177
|
May 26, 2024
|
Is it possible to get the data that is seen by the model during training?
|
|
1
|
124
|
May 26, 2024
|
Parallelize Mistral/ llama2 output
|
|
1
|
154
|
May 25, 2024
|
Decision Transformer a question about the tutorial
|
|
0
|
127
|
April 15, 2024
|
Understanding the Decision Transformer
|
|
0
|
146
|
May 25, 2024
|
How to get the loss from the Trainer class?
|
|
0
|
167
|
May 25, 2024
|
Modify the model input format in a .tflite file generated by the run_image_classification.py script
|
|
0
|
111
|
May 24, 2024
|
How to get list of downloaded models names?
|
|
6
|
5034
|
May 24, 2024
|
Mistral load_in_8bit slow inference
|
|
0
|
252
|
May 24, 2024
|
Perplexity Calculation in run_clm.py
|
|
0
|
278
|
May 23, 2024
|
Can I dynamically add or remove LoRA weights in the transformer library like diffusers
|
|
3
|
942
|
May 23, 2024
|
Is it possible to generate more than one token when using a decoder only model via forward pass?
|
|
1
|
637
|
May 23, 2024
|
Trainer RuntimeError: The size of tensor a (462) must match the size of tensor b (448) at non-singleton dimension 1
|
|
17
|
45484
|
May 23, 2024
|
ValueError: too many values to unpack (expected 2) or not enough values to unpack (expected 2, got 1). T5ForConditionalGeneration
|
|
0
|
181
|
May 23, 2024
|
T5 tokenizer / ideal method of calculating max_sequence_length?
|
|
1
|
548
|
May 22, 2024
|
Pass input_embed to WhisperDecoder
|
|
0
|
83
|
May 22, 2024
|
How to fix ValueError: The model did not return a loss from the inputs?
|
|
1
|
616
|
May 22, 2024
|
Transformers.js went wrong during the model construction
|
|
0
|
484
|
May 21, 2024
|
System RAM gets full in sometime and ( VideoMAE ) training job is killed
|
|
0
|
65
|
May 21, 2024
|
What data batch does SFTTrainer looks at when resumed training
|
|
0
|
109
|
May 21, 2024
|
TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_4bit'
|
|
7
|
20551
|
October 7, 2023
|
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable
|
|
1
|
344
|
May 20, 2024
|
"No token was detected" when using Hosted inference API
|
|
3
|
756
|
May 20, 2024
|
Fine-tuning BERT for vulnerability detection with data sharing the same label
|
|
0
|
100
|
May 17, 2024
|
TypeError: MistralModel.__init__() got an unexpected keyword argument 'safe_serialization'
|
|
0
|
406
|
May 17, 2024
|
Training Longformer works on jupyter notebook but not with .py
|
|
0
|
91
|
May 17, 2024
|
Mixtral-8x7B trained with `--load_in_4bit`, showed as Tensor type F32
|
|
3
|
159
|
May 17, 2024
|