Saving model in safetensors format through Trainer fails for Gemma 2 due to shared tensors
|
|
5
|
1539
|
September 30, 2024
|
How can I check the implementation of tokenizer.decode()
|
|
6
|
60
|
September 30, 2024
|
DeepSpeed MII pipeline issue
|
|
1
|
35
|
September 30, 2024
|
[MarianTokenizer] Clarify the use of the vocab parameter
|
|
3
|
818
|
September 29, 2024
|
Convert pre-trained MHA weights to GQA weights
|
|
1
|
384
|
September 29, 2024
|
Training from a checkpoint and freezing some of model's parameters
|
|
2
|
855
|
September 29, 2024
|
Deepspeed mii library issues
|
|
1
|
78
|
September 29, 2024
|
How to use Qwen2-VL on multiple gpus?
|
|
2
|
1510
|
September 28, 2024
|
Using gradient checkpointing and KV caching when generation happens in no grad context
|
|
2
|
326
|
September 28, 2024
|
Multi-GPU inference with LLM produces gibberish
|
|
14
|
6618
|
September 28, 2024
|
How memory is managed in model.generate() method?
|
|
2
|
50
|
September 27, 2024
|
Cannot pin 'torch.cuda.LongTensor' only dense CPU tensors can be pinned
|
|
1
|
1175
|
September 26, 2024
|
Unable to free whole GPU memory even after ``del var; gc.collect; empty_cache()``
|
|
8
|
635
|
September 26, 2024
|
How can I speedup T5 load?
|
|
1
|
31
|
September 26, 2024
|
How to add model repo's snapshots to the Hugging Face cache?
|
|
1
|
192
|
September 25, 2024
|
Alternative Language Modeling Loss Calculation
|
|
0
|
83
|
September 25, 2024
|
Error in pad() function of transformers/tokenization_utils_base.py
|
|
4
|
198
|
September 24, 2024
|
RuntimeError with Mixed Precision during LoRA Fine-Tuning in LLAVA on Small GPU Machine
|
|
1
|
260
|
September 23, 2024
|
Extract data from html page and extract pre-structured JSON
|
|
1
|
623
|
September 23, 2024
|
Question about training_step Function in Class Trainer
|
|
0
|
41
|
September 23, 2024
|
NotImplementedError: ggml_type 21 not implemented
|
|
2
|
88
|
September 23, 2024
|
Huggingface website issue with links in the documentation
|
|
0
|
12
|
September 22, 2024
|
QLoRA with GPTQ
|
|
3
|
2051
|
September 22, 2024
|
Questions about training bert with two columns data
|
|
0
|
34
|
September 21, 2024
|
Blip-2 for extraction of image and text embeddings
|
|
0
|
675
|
September 20, 2024
|
Llama 3.1 70-B run on 32 GB Vram?
|
|
5
|
4002
|
September 20, 2024
|
CUDA Memory issue for model.generate() in AutoModelForCausalLM
|
|
2
|
1459
|
September 20, 2024
|
Is prompt properly implemented in the whisper model?
|
|
1
|
1609
|
September 19, 2024
|
Trainer and Accelerate
|
|
13
|
10547
|
September 19, 2024
|
A custom trainer for multi-task learning?
|
|
1
|
847
|
September 18, 2024
|