Intermediate

Topic	Replies	Views	Activity
Fine-tuning with Different Model Heads	4	753	April 30, 2024
Creating sharded IterableDataset from a list of IterableDatasets?	2	543	July 2, 2024
An error i ve been trying to fix for days now	4	419	November 19, 2024
Setting seed within model.generate()	0	296	November 11, 2024
Fine tuning RoBerta got an unexpected keyword argument 'labels'	2	957	May 1, 2024
How can I use evaluate's perplexity metric on a model that's already loaded?	0	1656	July 28, 2023
Cannot Merge Lora weights back to the base model	8	305	October 29, 2024
FAQ question generation and answering using few shot learning	1	1144	March 14, 2023
Using a fixed vocabulary?	2	929	November 8, 2021
How do I deploy Gradio app with Kubernetes?	0	1578	October 15, 2022
Train loss goes to zero after some epochs	0	280	August 11, 2023
Interpreting train_loss/val_loss Plot	3	786	March 24, 2023
WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time when adding rouge-score	0	1572	February 14, 2022
One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior	0	1568	February 21, 2021
Weird output from model.generate()	1	1106	September 21, 2023
Fine tuning facebook/bart-large-mnli zeroshot classifier	2	901	June 30, 2023
Open-sourcing better cross-encoders for STILTS and better IR?	2	900	October 9, 2021
Network is Unreachable Error	0	1541	July 26, 2022
Generating sentence embeddings from pretrained transformers model	1	1089	January 22, 2021
Probsparse_attention in Informer	3	756	March 31, 2023
BERT Split NER Labeling	1	1052	December 7, 2021
A new dataset for multi-label text classification	1	1039	September 30, 2021
Which weights change when fine-tunning a pre-trained model?	3	733	June 11, 2024
Constrain output format from beam search in Donut doc classification	4	647	September 30, 2022
Generating [PAD] tokens during GPT2 inference	0	1423	August 22, 2022
`serving` signature in TensorFlow Serving blogpost	2	820	August 9, 2021
How to understand the answer_start parameter of Squad dataset for training BERT-QA model + practical implications for creating custom dataset?	1	1001	September 1, 2023
Get well adjusted confidence scores from similarity of CLIP encodings	1	558	July 25, 2024
Using XLA fast text generation with Pegasus models	5	569	August 25, 2022
Training for GPTQ, possible?	1	982	October 24, 2023
Sampling: what's the secret sauce?	2	797	August 22, 2022
Primer on Fine Tuning Text generation models (like GPT)	0	1380	November 14, 2022
Is there any way to avoid CPU bottlenecks when doing single prompt inference?	1	968	June 12, 2023
Trouble loading checkpoint shards for microsoft/Phi-3-mini-4k-instruct	1	966	May 5, 2024
Using .generate with TAPAS as encoder in EncoderDecoder	4	610	January 18, 2022
Transformer's output as input to other model	4	610	March 27, 2021
Training Loss 0.0000 and Validation Loss nan	2	140	March 12, 2025
How do Sequence to Sequence architectures (BART, LED) learn the end of generation?	2	781	February 14, 2022
Train Roberta from scratch for custom dataset	1	945	May 2, 2023
Read data of pdf or just image format as a part of promt	0	1333	May 29, 2023
Identifying and getting right embeddings from the fine tuned BERT on domain specific data	0	1328	September 8, 2021
TGI with guidance generates weird output when asked to answer in a "structured" way	3	117	February 17, 2025
Giving attention mask to ppo_trainer	0	233	May 4, 2024
Decicoder finetune error: understanding naive_attention_prefill	1	520	September 17, 2023
Save custom transformer as PreTrainedModel	1	924	September 7, 2021
🤪 Deploying huggingface models to Chai	1	513	April 29, 2021
Interpreting logs by the trainer	1	910	May 19, 2023
Invalid image format	2	418	October 29, 2024
Mistral - Sentence classification - mat1 and mat2 shapes cannot be multiplied	4	574	November 5, 2024
Distributed inference for datasets created on the fly	3	641	October 10, 2023