Donut inference at production
|
|
0
|
411
|
November 11, 2022
|
Output embedding from each self-attention head from each encoder layer
|
|
0
|
410
|
February 28, 2022
|
LayoutLMv3 processor error
|
|
4
|
103
|
September 27, 2024
|
Infilling multiple mask spans with BartForConditionalGeneration
|
|
0
|
408
|
July 12, 2022
|
How to change BERT attention value during testing
|
|
0
|
407
|
October 6, 2021
|
Is it possible to run the encoder part and decoder part of a NLG model as 2 steps?
|
|
0
|
405
|
January 26, 2022
|
Can I do a DPO training on a synthetic dataset?
|
|
0
|
404
|
December 6, 2023
|
Default for the Decoder past_key_values - Marian
|
|
0
|
403
|
January 5, 2023
|
Fixed output length "summarization"/"question-answering"
|
|
0
|
402
|
October 6, 2022
|
T5 extractive behavior
|
|
0
|
402
|
February 28, 2022
|
SAMModel output size different to the input
|
|
2
|
231
|
June 6, 2024
|
Similarity search based on multiple text attributes
|
|
0
|
398
|
December 4, 2023
|
Create speech to text training dataset using text to speech model
|
|
0
|
398
|
February 8, 2023
|
Unable to train a good model after using exclude_from_weight_decay
|
|
0
|
397
|
October 19, 2021
|
Combine LORA with full finetuning
|
|
0
|
392
|
September 4, 2023
|
Why does the ViT change the logging setup in my code?
|
|
0
|
392
|
October 26, 2022
|
How do I fix this error when training in TRL with QLora and PPO?
|
|
0
|
390
|
April 13, 2024
|
Model inferencing is blocking the main fastapi thread
|
|
1
|
49
|
March 28, 2025
|
Function Call via HuggingFaceLLM
|
|
1
|
275
|
August 22, 2024
|
DocVQA test dataset evaluation on qwen2.5-VL-3B
|
|
0
|
69
|
February 16, 2025
|
Generate token by token for m2m100_418
|
|
0
|
387
|
February 6, 2024
|
Why use `val_transforms()` function in image classification example instead of `feature_extractor`?
|
|
0
|
386
|
July 4, 2022
|
Loading extra memory in GPU 0 using DDP
|
|
0
|
384
|
June 18, 2023
|
Does Trainer.train repeat streaming dataset when max_steps is not reached?
|
|
0
|
382
|
May 26, 2023
|
Huggingface infinity based inference server vs AWS Inferentia
|
|
0
|
381
|
July 21, 2022
|
Why i can't use or can't pass past_key_values = DynamicCache() into Llama 3 model
|
|
1
|
269
|
October 8, 2024
|
How to classification a paragraph to different category descriptions given in sentences/list?
|
|
0
|
380
|
March 29, 2023
|
Audio upsampling on-the-fly
|
|
0
|
378
|
July 4, 2023
|
How to import wav2vec fine tuned model to scala
|
|
0
|
378
|
August 1, 2021
|
BERT model not showing up as trainable in Flax
|
|
0
|
376
|
June 27, 2022
|
News topic classifier
|
|
0
|
375
|
August 8, 2021
|
How to create multiple MCP server hosted on single endpoint with different Routes
|
|
1
|
51
|
June 18, 2025
|
How to obtain latent vectors from model with transformers
|
|
1
|
263
|
April 9, 2024
|
Sequence to sequence model
|
|
0
|
66
|
November 22, 2024
|
Batch size TPUv4
|
|
0
|
371
|
November 4, 2022
|
Unable to Finetune Deberta
|
|
0
|
369
|
October 26, 2022
|
Comparison of methods for large token inputs
|
|
0
|
367
|
July 5, 2023
|
Question Answering Prediction without answear
|
|
0
|
367
|
December 31, 2022
|
Code example of getting cross attention from T5?
|
|
0
|
365
|
February 15, 2023
|
Same sequence maps to different token ids
|
|
0
|
365
|
August 29, 2022
|
Training for langgraph agent
|
|
0
|
364
|
July 11, 2024
|
Unable to apply transfer learning to certain models
|
|
0
|
364
|
March 23, 2021
|
CodeLama LlamaForSequenceClassification
|
|
0
|
361
|
October 16, 2023
|
Pad token vs -100 index_id
|
|
2
|
37
|
April 1, 2025
|
8-bit t5-models in the Widgets
|
|
0
|
360
|
November 2, 2022
|
Tabular Data Autoencoder Loss Plateau
|
|
0
|
360
|
September 28, 2021
|
How to use DeepSparse in Transformer?
|
|
1
|
253
|
March 11, 2024
|
How to add attention map between words and tags
|
|
0
|
357
|
June 13, 2021
|
Why there is no open source hub for training pipelines on huggingface?
|
|
0
|
356
|
August 26, 2022
|
Tensorboard support when using optimizer with 2 separate learning rates
|
|
0
|
356
|
October 9, 2021
|