🤗Transformers

Topic	Replies	Views	Activity
Grouping by length makes training loss oscillate and makes evaluation loss worse 🤗Transformers	2	231	June 3, 2025
How can LLMs be fine-tuned for specialized domain knowledge? 🤗Transformers	2	278	June 3, 2025
Implementing Triplet loss in Vit 🤗Transformers	1	24	June 3, 2025
Using Huggingface for computer vision (Tensorflow)? 🤗Transformers	3	406	June 2, 2025
valueError: Supplied state dict for layers does not contain `bitsandbytes__*` and possibly other `quantized_stats`(when load saved quantized model) 🤗Transformers	4	766	May 30, 2025
RGBA -> RGB default background color vs padding color 🤗Transformers	1	9	May 30, 2025
Why is Static Cache latency high? 🤗Transformers	2	18	May 29, 2025
Error using Trainer with Colab notebook, anyone have a solution? 🤗Transformers	1	65	May 29, 2025
LoRA training with accelerate / deepspeed DeepSpeed	3	2308	May 28, 2025
How does Q, K, V differ in LLM? 🤗Transformers	1	20	May 28, 2025
The effect of padding_side 🤗Transformers	13	14786	May 27, 2025
Prompt caching in pipelines 🤗Transformers	1	48	May 27, 2025
How does Llama For Sequence Classification determine what class corresponds to what label? 🤗Transformers	10	4917	May 25, 2025
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat 🤗Transformers	2	670	May 25, 2025
How to merge fine-tuned LLaMA-3.1-8B (via LLaMA-Factory) into a single GGUF for LM Studio? 🤗Transformers	1	40	May 25, 2025
Generate keeps increasing memory usage on ubuntu 🤗Transformers	6	39	May 25, 2025
How does Transformers Library work under the hood? 🤗Transformers	1	15	May 22, 2025
Identical Evaluation Metrics for SFT & DPO–Fine-Tuned LoRA Adapter on SeaLLMs-v3-7B 🤗Transformers	1	19	May 22, 2025
Create a weighted loss function to handle imbalance? 🤗Transformers	3	1250	May 21, 2025
Incorrect total train batch size when using tp_size > 1 and deepspeed DeepSpeed	1	45	May 20, 2025
How do I load a trained checkpoint model? 🤗Transformers	1	52	May 20, 2025
Fine tuning on qwen3 🤗Transformers	2	654	May 19, 2025
TokenClassificationPipeline produce entities with "##" characters 🤗Transformers	6	25	May 19, 2025
PPO Training does not improve SFT model outputs (Metrics identical before and after PPO) 🤗Transformers	1	41	May 19, 2025
Cuda out of memory in SD3 🤗Transformers	4	27	May 16, 2025
AttributeError: 'CustomQwen3Model' object has no attribute 'config' 🤗Transformers	1	12	May 16, 2025
How to freeze layers while fine-tuning? 🤗Transformers	2	133	May 16, 2025
Trainer default distributed training behaviour 🤗Transformers	2	20	May 15, 2025
What does increasing number of heads do in the Multi-head Attention? 🤗Transformers	5	29950	May 15, 2025
Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification? 🤗Transformers	2	419	May 14, 2025