What is `self.loss_function` in `forward()` of newly released LLM?
|
|
0
|
22
|
January 14, 2025
|
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length
|
|
4
|
34768
|
January 13, 2025
|
Qwen Not work anymore
|
|
1
|
124
|
January 13, 2025
|
No matter what I do the HFā¦
|
|
2
|
25
|
January 13, 2025
|
Expected `tensors` and `new_tensors` to have the same type but found <class 'tuple'> and <class 'torch.Tensor'>
|
|
2
|
12
|
January 12, 2025
|
Fine-tuning an NLLB model for a new language
|
|
7
|
2229
|
January 12, 2025
|
Preparing data for Donut training results in error "ArrowInvalid: offset overflow while concatenating arrays"
|
|
2
|
231
|
January 12, 2025
|
Coherforai I have API but I can't Access
|
|
1
|
18
|
January 10, 2025
|
Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines
|
|
16
|
28372
|
January 10, 2025
|
Mamba2 Cache Position
|
|
3
|
67
|
January 10, 2025
|
Multi-input tag and ,multi-label output for token classification using Bert pretrained model
|
|
1
|
54
|
January 9, 2025
|
TypeError: 'list' object is not callable
|
|
1
|
19
|
January 8, 2025
|
Generate() speculative decoding with static string
|
|
0
|
15
|
January 7, 2025
|
Unable to Run Sentence Transformer Text embedding in Docker
|
|
1
|
188
|
January 7, 2025
|
How to output loss from model.generate()?
|
|
16
|
5742
|
January 7, 2025
|
Labels in Audio Frame classification task (Wav2Vec2 For Audio Frame Classification)
|
|
1
|
641
|
January 7, 2025
|
The effect of padding_side
|
|
11
|
12699
|
January 7, 2025
|
All-mpnet-base-v2 get different results in Spark-NLP vs SentenceTransformers
|
|
2
|
38
|
January 6, 2025
|
Loss not Decreasing: Hiera MAE Pretraining from Scratch
|
|
0
|
18
|
January 6, 2025
|
Is there anyway to modify the Trainer eval_loop aggregate function?
|
|
1
|
16
|
January 6, 2025
|
NotImplementedError: Cannot copy out of meta tensor; no data!
|
|
3
|
6206
|
January 4, 2025
|
Timeout Issue with DeepSpeed on Multiple GPUs
|
|
1
|
317
|
January 3, 2025
|
How to Use torch_directml GPU with Transformers.Trainer for Fine-Tuning?
|
|
0
|
62
|
January 2, 2025
|
-inf values for logit score outputs with model.generate
|
|
3
|
744
|
January 2, 2025
|
Trainer warning with the new version
|
|
2
|
4158
|
January 2, 2025
|
ValueError: Make sure that you pass in as many target sizes as the batch dimension of the logits
|
|
2
|
75
|
January 1, 2025
|
Training loss no drop for MT5ForSequenceClassification
|
|
2
|
182
|
January 1, 2025
|
Batched Generation with Flash Attention
|
|
1
|
303
|
December 30, 2024
|
Using persistent storage on HF spaces
|
|
3
|
165
|
December 30, 2024
|
Trainer API object detection
|
|
2
|
34
|
December 29, 2024
|