Reducing unwanted generation in Gemma 3
|
|
7
|
218
|
April 5, 2025
|
Difference between pre-training and fine tuning with language modeling to instill new knowledge
|
|
3
|
76
|
April 3, 2025
|
What is the most efficient way to dynamically change context mid-generation?
|
|
4
|
31
|
April 2, 2025
|
🚀 Introducing FlashTokenizer: The World's Fastest CPU Tokenizer!
|
|
2
|
27
|
April 4, 2025
|
Using DistributedSampler with accelerate
|
|
4
|
107
|
April 2, 2025
|
ValueError: Could not interpret optimizer identifier
|
|
1
|
178
|
April 1, 2025
|
Model_accepts_loss_kwargs detection based on **kwargs is too permissive
|
|
0
|
41
|
April 1, 2025
|
Limit mask size in Mask2Former results
|
|
1
|
25
|
April 1, 2025
|
Args in RewardConfig
|
|
1
|
15
|
April 1, 2025
|
FASTAI:TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType)
|
|
2
|
40
|
March 29, 2025
|
Optimize GPU Usage for Long-Context Training
|
|
2
|
60
|
March 28, 2025
|
The hidden_states when i use model.generate
|
|
4
|
1663
|
March 28, 2025
|
Fixing the random seed in the Trainer does not produce the same results across runs
|
|
5
|
17302
|
March 27, 2025
|
The size of tensor a (882) must match the size of tensor b (568) at non-singleton dimension 1
|
|
2
|
56
|
March 27, 2025
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0!
|
|
2
|
103
|
March 25, 2025
|
Unexpected behavior of load_best_model_at_end in Trainer (or am I doing it wrong?)
|
|
2
|
35
|
March 25, 2025
|
Load_best_model_at_end doesn't work?
|
|
1
|
81
|
March 25, 2025
|
Molformer model training error
|
|
6
|
27
|
March 25, 2025
|
Extract Attention Weights from a Specific Layer and Head Efficiently
|
|
1
|
79
|
March 25, 2025
|
Clarification on Commercial License Impact of LayoutLMv3ImageProcessor within UdopProcessor
|
|
0
|
23
|
March 24, 2025
|
Runtime Error: Cuda Initialization
|
|
13
|
138
|
March 24, 2025
|
Reasoning LLM Benchmarking
|
|
2
|
714
|
March 24, 2025
|
Web worker fails to process input data
|
|
1
|
24
|
March 22, 2025
|
Adding dropout in custom model, but setting dropout through .from_pretrained()
|
|
2
|
48
|
March 21, 2025
|
Multimodal training
|
|
4
|
47
|
March 21, 2025
|
Target branch/tag/commit for automatic Hub pushes
|
|
1
|
13
|
March 21, 2025
|
Two questions when I wraped the AutoModelForMaskedLM
|
|
7
|
27
|
March 21, 2025
|
One question is about the pretrain method in Transformer packge ?
|
|
1
|
202
|
March 19, 2025
|
Partially loss calculation with transformers LLM Trainer and DataCollator
|
|
1
|
67
|
March 19, 2025
|
Custom VLM - Swapping a vision encoder from a VLM
|
|
1
|
117
|
March 19, 2025
|