Intermediate

Topic	Replies	Views	Activity
Fine Tuning A sentence transformer model with my own data	2	3027	April 17, 2024
DeepSpeed giving Assertion Error	2	2962	July 22, 2023
What is the limit of grad accumulation?	2	2906	May 4, 2021
Difference between GAT and Transformer?	0	886	April 7, 2022
What is the difference between triplet loss and contrastive loss?	1	1975	June 18, 2022
Loading models sometimes maxes DISK%, then crashes	2	2867	October 8, 2020
Why is it so slow to access data through iteration with hugginface dataset?	2	2841	July 21, 2022
How to fine-tune an LLM to support funciton calling	0	874	November 15, 2023
how to convert text to word embeddings using bert's pretrained model 'faster'?	1	3466	January 4, 2021
Summariser pipeline giving different results on same model with fixed seed	0	870	August 17, 2022
Run training script in DDP using GLOO	1	1941	August 17, 2022
Fine tunning QA model in SQUAD 2 dataset with more than one answer	2	878	November 6, 2024
Using GPT-Neo-125M with ONNX	3	1348	July 5, 2022
Model validation failed - Target is multiclass but average='binary'	2	2705	January 21, 2024
Specify attention masks for some heads in multi-head attention	3	2335	November 17, 2020
BPEDecoder no spaces after special tokens	4	2032	April 19, 2023
Perplexity from fine-tuned GPT2LMHeadModel with and without lm_head as a parameter	4	2030	May 10, 2022
Fine-tuning Mistral/Mixtral for sequence classification on long context	2	2606	May 29, 2024
Convert models to Longformer	3	2189	February 1, 2021
FineTune LLM for regex	3	2139	April 21, 2024
Load Custom Model	8	1424	November 21, 2022
Image similarity	2	2435	March 31, 2023
Segmentation fault (Core dumped) with datasets	2	2417	July 9, 2021
Deploying Seq2Seq using ONNX on GPU	0	743	March 24, 2022
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['tokens', 'id', 'space_after', 'ner_tags', 'ner_ids']	2	2391	April 21, 2023
Add_faiss_index with multiple columns	0	731	August 19, 2023
Converting GPT2 to JavaScript?	1	1630	April 17, 2021
Combine multiple Lora's for group photo?	1	515	January 3, 2025
How to exclude layers in weight decay	1	2874	October 18, 2021
Accelerated Inference API not taking parameters?	5	1633	October 26, 2022
Push model to hugging face hub without Trainer	7	1406	May 14, 2024
Linear learning rate despite lr_scheduler_type="polynomial"	4	1763	September 2, 2021
TGI and turn off Flash Attention v2	4	1748	August 23, 2024
DPO training data format	7	1375	September 23, 2024
Using TRL on TPU	1	155	February 11, 2025
Batched BertForMaskedLM inference loss issue	0	688	February 23, 2022
Preprocessing for T5 Denoising	1	2713	May 20, 2021
GPTQ+PEFT model running very slowly at inference	4	1686	October 24, 2023
Open-LLM-Leaderboard for dummies	3	327	December 30, 2024
How to generate on multiple GPU's	3	1836	August 30, 2022
Multinode DeepSpeed T5 Experiment Issues with Hf-Trainer	2	1159	August 3, 2022
AttributeError: LayoutLMTokenClassification object has no attribute 'config'	3	1775	August 13, 2022
How to concatenate the word embedding for special tokens and words	1	2510	June 13, 2021
Properly loading a fine tuned model from directory	2	2040	August 25, 2020
How to continue to pre-train gpt2?	2	2031	July 1, 2023
Cache Proxy - Like with Docker Registries	1	444	October 21, 2024
DeBERTaV3 ONNX conversion error	2	2028	July 25, 2022
I Fine-tuned a llama 7b on a custom dataset, The response from inference generation start good, then words start to connect with out space	4	1544	July 19, 2023
What is the official way to run a wandb sweep with hugging face (HF) transformers?	2	1995	July 25, 2023
Finetuning LLama2-70B using 4-bit quantization on multi-GPU using Deepspeed ZeRO	1	2405	March 19, 2024