🤗Transformers

Topic	Replies	Views	Activity
RGBA -> RGB default background color vs padding color 🤗Transformers	1	11	May 30, 2025
Why is Static Cache latency high? 🤗Transformers	2	29	May 29, 2025
Error using Trainer with Colab notebook, anyone have a solution? 🤗Transformers	1	112	May 29, 2025
LoRA training with accelerate / deepspeed DeepSpeed	3	2490	May 28, 2025
How does Q, K, V differ in LLM? 🤗Transformers	1	29	May 28, 2025
Prompt caching in pipelines 🤗Transformers	1	81	May 27, 2025
How does Llama For Sequence Classification determine what class corresponds to what label? 🤗Transformers	10	5162	May 25, 2025
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat 🤗Transformers	2	940	May 25, 2025
How to merge fine-tuned LLaMA-3.1-8B (via LLaMA-Factory) into a single GGUF for LM Studio? 🤗Transformers	2	85	May 25, 2025
Generate keeps increasing memory usage on ubuntu 🤗Transformers	6	67	May 25, 2025
How does Transformers Library work under the hood? 🤗Transformers	1	16	May 22, 2025
Identical Evaluation Metrics for SFT & DPO–Fine-Tuned LoRA Adapter on SeaLLMs-v3-7B 🤗Transformers	1	38	May 22, 2025
Create a weighted loss function to handle imbalance? 🤗Transformers	3	1968	May 21, 2025
Incorrect total train batch size when using tp_size > 1 and deepspeed DeepSpeed	1	86	May 20, 2025
How do I load a trained checkpoint model? 🤗Transformers	1	91	May 20, 2025
Fine tuning on qwen3 🤗Transformers	2	1252	May 19, 2025
TokenClassificationPipeline produce entities with "##" characters 🤗Transformers	6	25	May 19, 2025
PPO Training does not improve SFT model outputs (Metrics identical before and after PPO) 🤗Transformers	1	56	May 19, 2025
Cuda out of memory in SD3 🤗Transformers	4	34	May 16, 2025
AttributeError: 'CustomQwen3Model' object has no attribute 'config' 🤗Transformers	1	16	May 16, 2025
How to freeze layers while fine-tuning? 🤗Transformers	2	328	May 16, 2025
Trainer default distributed training behaviour 🤗Transformers	2	44	May 15, 2025
What does increasing number of heads do in the Multi-head Attention? 🤗Transformers	5	30493	May 15, 2025
Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification? 🤗Transformers	2	433	May 14, 2025
Mamba2 Cache Position 🤗Transformers	4	171	May 12, 2025
Building something that help people who really need help using ai 🤗Transformers	5	42	May 12, 2025
(first token generation puzzle)Why does transformers take the last dimension as output when generating the first token in language generation process? 🤗Transformers	9	2145	May 11, 2025
Transformers: Informer model use for weather forecasting 🤗Transformers	1	24	May 9, 2025
Resolving "Cannot Perform Fine-Tuning on Purely Quantized Models" Error in Falcon Model Training? 🤗Transformers	4	9341	May 9, 2025
How to resume training from a checkpoint using huggingface trainer 🤗Transformers	5	267	May 8, 2025