TRL SFTTrainer 0.15 compute_token_accuracy error
|
|
2
|
45
|
March 18, 2025
|
How can I set `max_memory` parameter while loading Quantized model with Model Pipeline class?
|
|
2
|
23
|
March 18, 2025
|
E5 embedding models
|
|
1
|
11
|
March 17, 2025
|
Using TableTransformer in Standalone Mode Without Hugging Face Hub Access
|
|
1
|
17
|
March 17, 2025
|
The checkpoint you are trying to load has model type `gemma2` but Transformers does not recognize this architecture
|
|
8
|
6173
|
March 17, 2025
|
Get each generated token last layer hidden state
|
|
3
|
25
|
March 16, 2025
|
Why does automodelforcausallm.from_pretrained() work on base models and not instruct models?
|
|
4
|
31
|
March 15, 2025
|
[Possibly] Forgotten TODO Comment for `TrainingArguments.default_optim`
|
|
1
|
20
|
March 14, 2025
|
Metrics for Training Set in Trainer
|
|
11
|
25209
|
March 14, 2025
|
How can LLMs be fine-tuned for specialized domain knowledge?
|
|
1
|
90
|
March 14, 2025
|
Corrupted deepspeed checkpoint
|
|
1
|
27
|
March 13, 2025
|
Model does not exist, inference API don't work
|
|
5
|
127
|
March 13, 2025
|
Q&A the stock prediction
|
|
1
|
1192
|
January 7, 2024
|
Difference BertModel, AutoModel and AutoModelForMaskedLM
|
|
8
|
4750
|
March 9, 2025
|
Injecting Multiple Modalities into a Transformer Decoder via Cross-Attention
|
|
1
|
19
|
March 9, 2025
|
Support for LLaMA in EncoderDecoder framework
|
|
1
|
510
|
March 8, 2025
|
SFTTrainer Doubling Speed on a Single GPU with DeepSpeed: Proposal for an Update to the Official Documentation and Verification Report
|
|
1
|
30
|
March 7, 2025
|
After fine tuning openai whisper model, there shows OSError WinError 123
|
|
1
|
11
|
March 7, 2025
|
About Hyperparameter Search with Ray Tune
|
|
2
|
17
|
March 7, 2025
|
Trainer.train() runs for long and appears to be stuck. How do I know that it's processing and not in loop
|
|
2
|
445
|
March 7, 2025
|
As of transformers v4.44, default chat template is no longer allowed
|
|
2
|
1991
|
March 7, 2025
|
Multi Objective Hyperparameter Optimization
|
|
3
|
22
|
March 7, 2025
|
Repetitive Token Generation During Evaluation in Fine-Tuned LLaMA Model
|
|
1
|
18
|
March 6, 2025
|
Looks like the new transformer 4.49.0 has some issues
|
|
3
|
143
|
March 6, 2025
|
Don't apply complete
|
|
1
|
10
|
March 6, 2025
|
Speculative Decoding with Qwen Models
|
|
1
|
87
|
March 5, 2025
|
SSL Certificate Issue
|
|
8
|
23384
|
March 5, 2025
|
How to correctly count downloads in revisions in the transformers hub
|
|
1
|
9
|
March 3, 2025
|
TypeError: argument 'ids': 'list' object cannot be interpreted as an integer when lora training
|
|
7
|
69
|
March 3, 2025
|
Can the image processor for instance segmentation be adapted to work with stacks of masks?
|
|
0
|
8
|
March 2, 2025
|