How does GPT decide to stop generating sentences without EOS token?
|
|
13
|
24634
|
August 19, 2024
|
Is model.generate slower than model forward call?
|
|
1
|
186
|
August 18, 2024
|
DeepSpeed Zero 3 with LoRA - Merging adapters
|
|
1
|
739
|
August 16, 2024
|
"You cannot perform fine-tuning on purely quantized models." error in LoRA model training?
|
|
3
|
2745
|
August 16, 2024
|
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [1, 128]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed t
|
|
1
|
1738
|
August 16, 2024
|
Summarization on long documents
|
|
63
|
59160
|
August 16, 2024
|
Bitsandbytes `has_fp16_weights` issue
|
|
1
|
190
|
August 15, 2024
|
Dola_layers attribute missing in GenerationConfig
|
|
0
|
52
|
August 14, 2024
|
Upload custom Llama2 model with injected linear layers
|
|
1
|
641
|
August 14, 2024
|
Select Source and Target Langauge in multi-language translation models
|
|
1
|
379
|
August 14, 2024
|
Model not getting loaded
|
|
1
|
129
|
August 13, 2024
|
Frameworks for Benchmarking Transformers' Inference?
|
|
1
|
380
|
August 13, 2024
|
"ValueError: Unrecognized model type" when loading my trained custom model
|
|
3
|
2775
|
August 13, 2024
|
Recency-aware finetuning for question answering
|
|
2
|
65
|
August 13, 2024
|
RemoveColumnsCollator is removing all columns
|
|
4
|
739
|
August 12, 2024
|
Can't set pad_token by adding special token to Llama's tokenizer
|
|
4
|
5959
|
August 12, 2024
|
Handling Peft Model the right way (save, load, inference)
|
|
0
|
137
|
August 10, 2024
|
Multiple gpu training
|
|
1
|
2731
|
August 10, 2024
|
Embeddings from fine-tuned ModelForSequenceClassification
|
|
0
|
64
|
August 9, 2024
|
Is it possible to use KenLM with Whisper?
|
|
0
|
97
|
August 9, 2024
|
Segmentation fault (core dumped)
|
|
13
|
18979
|
August 9, 2024
|
ASR Model Tokenizer Won't Load
|
|
0
|
74
|
August 8, 2024
|
How to specify the gpu number to load the input during the inference of huggingface pipeline in a multi-gpu setup?
|
|
2
|
589
|
August 8, 2024
|
Adapting BLIP2 for zero-shot classification
|
|
3
|
1500
|
August 8, 2024
|
For the Seq2SeqTrainingArguments class, what happens when I set both adafactor=True and set a learning rate?
|
|
1
|
429
|
August 6, 2024
|
How to see what part of model are offloaded to CPU?
|
|
1
|
138
|
August 7, 2024
|
Transformer trackers pretrained weights
|
|
0
|
16
|
August 6, 2024
|
Track include_num_input_tokens_seen in Trainer
|
|
0
|
159
|
August 6, 2024
|
Load Phi 3 small on Nvidia Tesla V100 - Flash Attention
|
|
3
|
1056
|
August 6, 2024
|
Idefics2 multi turn inference
|
|
0
|
148
|
August 5, 2024
|