Intermediate

Topic	Replies	Views	Activity
Donut inference at production	0	411	November 11, 2022
Output embedding from each self-attention head from each encoder layer	0	410	February 28, 2022
LayoutLMv3 processor error	4	103	September 27, 2024
Infilling multiple mask spans with BartForConditionalGeneration	0	408	July 12, 2022
How to change BERT attention value during testing	0	407	October 6, 2021
Is it possible to run the encoder part and decoder part of a NLG model as 2 steps?	0	405	January 26, 2022
Can I do a DPO training on a synthetic dataset?	0	404	December 6, 2023
Default for the Decoder past_key_values - Marian	0	403	January 5, 2023
Fixed output length "summarization"/"question-answering"	0	402	October 6, 2022
T5 extractive behavior	0	402	February 28, 2022
SAMModel output size different to the input	2	231	June 6, 2024
Similarity search based on multiple text attributes	0	398	December 4, 2023
Create speech to text training dataset using text to speech model	0	398	February 8, 2023
Unable to train a good model after using exclude_from_weight_decay	0	397	October 19, 2021
Combine LORA with full finetuning	0	392	September 4, 2023
Why does the ViT change the logging setup in my code?	0	392	October 26, 2022
How do I fix this error when training in TRL with QLora and PPO?	0	390	April 13, 2024
Model inferencing is blocking the main fastapi thread	1	49	March 28, 2025
Function Call via HuggingFaceLLM	1	275	August 22, 2024
DocVQA test dataset evaluation on qwen2.5-VL-3B	0	69	February 16, 2025
Generate token by token for m2m100_418	0	387	February 6, 2024
Why use `val_transforms()` function in image classification example instead of `feature_extractor`?	0	386	July 4, 2022
Loading extra memory in GPU 0 using DDP	0	384	June 18, 2023
Does Trainer.train repeat streaming dataset when max_steps is not reached?	0	382	May 26, 2023
Huggingface infinity based inference server vs AWS Inferentia	0	381	July 21, 2022
Why i can't use or can't pass past_key_values = DynamicCache() into Llama 3 model	1	269	October 8, 2024
How to classification a paragraph to different category descriptions given in sentences/list?	0	380	March 29, 2023
Audio upsampling on-the-fly	0	378	July 4, 2023
How to import wav2vec fine tuned model to scala	0	378	August 1, 2021
BERT model not showing up as trainable in Flax	0	376	June 27, 2022
News topic classifier	0	375	August 8, 2021
How to create multiple MCP server hosted on single endpoint with different Routes	1	51	June 18, 2025
How to obtain latent vectors from model with transformers	1	263	April 9, 2024
Sequence to sequence model	0	66	November 22, 2024
Batch size TPUv4	0	371	November 4, 2022
Unable to Finetune Deberta	0	369	October 26, 2022
Comparison of methods for large token inputs	0	367	July 5, 2023
Question Answering Prediction without answear	0	367	December 31, 2022
Code example of getting cross attention from T5?	0	365	February 15, 2023
Same sequence maps to different token ids	0	365	August 29, 2022
Training for langgraph agent	0	364	July 11, 2024
Unable to apply transfer learning to certain models	0	364	March 23, 2021
CodeLama LlamaForSequenceClassification	0	361	October 16, 2023
Pad token vs -100 index_id	2	37	April 1, 2025
8-bit t5-models in the Widgets	0	360	November 2, 2022
Tabular Data Autoencoder Loss Plateau	0	360	September 28, 2021
How to use DeepSparse in Transformer?	1	253	March 11, 2024
How to add attention map between words and tags	0	357	June 13, 2021
Why there is no open source hub for training pipelines on huggingface?	0	356	August 26, 2022
Tensorboard support when using optimizer with 2 separate learning rates	0	356	October 9, 2021