RGBA -> RGB default background color vs padding color
|
|
1
|
11
|
May 30, 2025
|
Why is Static Cache latency high?
|
|
2
|
29
|
May 29, 2025
|
Error using Trainer with Colab notebook, anyone have a solution?
|
|
1
|
112
|
May 29, 2025
|
LoRA training with accelerate / deepspeed
|
|
3
|
2490
|
May 28, 2025
|
How does Q, K, V differ in LLM?
|
|
1
|
29
|
May 28, 2025
|
Prompt caching in pipelines
|
|
1
|
81
|
May 27, 2025
|
How does Llama For Sequence Classification determine what class corresponds to what label?
|
|
10
|
5162
|
May 25, 2025
|
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat
|
|
2
|
940
|
May 25, 2025
|
How to merge fine-tuned LLaMA-3.1-8B (via LLaMA-Factory) into a single GGUF for LM Studio?
|
|
2
|
85
|
May 25, 2025
|
Generate keeps increasing memory usage on ubuntu
|
|
6
|
67
|
May 25, 2025
|
How does Transformers Library work under the hood?
|
|
1
|
16
|
May 22, 2025
|
Identical Evaluation Metrics for SFT & DPOâFine-Tuned LoRA Adapter on SeaLLMs-v3-7B
|
|
1
|
38
|
May 22, 2025
|
Create a weighted loss function to handle imbalance?
|
|
3
|
1968
|
May 21, 2025
|
Incorrect total train batch size when using tp_size > 1 and deepspeed
|
|
1
|
86
|
May 20, 2025
|
How do I load a trained checkpoint model?
|
|
1
|
91
|
May 20, 2025
|
Fine tuning on qwen3
|
|
2
|
1252
|
May 19, 2025
|
TokenClassificationPipeline produce entities with "##" characters
|
|
6
|
25
|
May 19, 2025
|
PPO Training does not improve SFT model outputs (Metrics identical before and after PPO)
|
|
1
|
56
|
May 19, 2025
|
Cuda out of memory in SD3
|
|
4
|
34
|
May 16, 2025
|
AttributeError: 'CustomQwen3Model' object has no attribute 'config'
|
|
1
|
16
|
May 16, 2025
|
How to freeze layers while fine-tuning?
|
|
2
|
328
|
May 16, 2025
|
Trainer default distributed training behaviour
|
|
2
|
44
|
May 15, 2025
|
What does increasing number of heads do in the Multi-head Attention?
|
|
5
|
30493
|
May 15, 2025
|
Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification?
|
|
2
|
433
|
May 14, 2025
|
Mamba2 Cache Position
|
|
4
|
171
|
May 12, 2025
|
Building something that help people who really need help using ai
|
|
5
|
42
|
May 12, 2025
|
(first token generation puzzle)Why does transformers take the last dimension as output when generating the first token in language generation process?
|
|
9
|
2145
|
May 11, 2025
|
Transformers: Informer model use for weather forecasting
|
|
1
|
24
|
May 9, 2025
|
Resolving "Cannot Perform Fine-Tuning on Purely Quantized Models" Error in Falcon Model Training?
|
|
4
|
9341
|
May 9, 2025
|
How to resume training from a checkpoint using huggingface trainer
|
|
5
|
267
|
May 8, 2025
|