Problem with pushing quantized model to hub
|
|
3
|
121
|
October 14, 2024
|
How to use Cache with message API
|
|
0
|
7
|
October 13, 2024
|
How do I do inference using the GPT models on TPUs?
|
|
5
|
2396
|
October 13, 2024
|
Getting token probabilities of a caption given an image from BLIP2
|
|
4
|
430
|
October 13, 2024
|
Why is BCELoss used for multi-label classification?
|
|
4
|
19
|
October 12, 2024
|
Best practice to train LLMs on long sequences?
|
|
0
|
12
|
October 12, 2024
|
Different Trainers, when to use which?
|
|
1
|
1392
|
October 12, 2024
|
Gradients in Data Collator cause Memory Leak
|
|
4
|
28
|
October 12, 2024
|
Question About the Transformer Concept
|
|
0
|
9
|
October 12, 2024
|
If I use llama 70b and 7b for speculative decoding, how should I put them on my multiple gpus in the code
|
|
0
|
10
|
October 11, 2024
|
Transformers cache not loading from a new vm
|
|
6
|
13
|
October 11, 2024
|
Bitsandbytes quantization and QLORA fine-tuning
|
|
0
|
20
|
October 11, 2024
|
Questions about vocab size, decoder start token, padding token, and appropriate config for custom seq2seq transformer model without any tokenizer
|
|
0
|
10
|
October 11, 2024
|
Repeat Yourself - 🤗 Transformers Design Philosophy
|
|
12
|
2520
|
October 10, 2024
|
Any Multi Modal LLMs that take direct pdf + text as input?
|
|
2
|
44
|
October 10, 2024
|
Training CausalLM to imitate Seq2SeqModel
|
|
2
|
553
|
October 10, 2024
|
valueError: Supplied state dict for layers does not contain `bitsandbytes__*` and possibly other `quantized_stats`(when load saved quantized model)
|
|
1
|
19
|
October 10, 2024
|
Seq2seq padding
|
|
1
|
17
|
October 10, 2024
|
Whisper for Audio Classification
|
|
3
|
1788
|
October 9, 2024
|
Should I Include Poet Information as a Feature in LLM Training with 3,356 Unique Poets?
|
|
0
|
35
|
October 9, 2024
|
UnboundLocalError: cannot access local variable 'input_ids' where it is not associated with a value
|
|
1
|
40
|
October 9, 2024
|
Index Error while Summarizing splitted Documents
|
|
7
|
28
|
October 9, 2024
|
Do I need to dequantization before merging the qlora
|
|
10
|
59
|
October 9, 2024
|
How to do model.generate() in evaluation steps with Trainer + FSDP?
|
|
4
|
2339
|
October 8, 2024
|
How can I obtain the logits via model.generate()?
|
|
2
|
27
|
October 8, 2024
|
Fine Tune with/without LORA
|
|
1
|
21
|
October 7, 2024
|
Not getting substantial training time improvement with LORA - is this expected?
|
|
1
|
618
|
October 7, 2024
|
How to properly load the PEFT LoRA model
|
|
3
|
5013
|
October 7, 2024
|
BitsAndBytes With DDP
|
|
3
|
20
|
October 7, 2024
|
How to Configure LLaMA-3:8B on HuggingFace to Generate Responses Similar to Ollama?
|
|
7
|
113
|
October 7, 2024
|