Intermediate

Topic	Replies	Views	Activity
Falcon-7b-instruct ALWAYS returns SHORT ANSWERS on inference endpoint	1	906	September 5, 2023
How do I finetune Blip2 model on a custom dataset?	1	508	October 1, 2024
Fine tune model='facebook/bart-large-mnli'	0	1270	May 16, 2022
HF Dataset as a Replay Buffer for RL applications	6	480	March 9, 2023
How to correct TypeError: zip argument #1 must support iteration training in multiple GPU	1	893	February 28, 2023
Text classification on small dataset (8K)	1	893	July 27, 2021
Repeatedly decoding tokens multiple times after PEFT fine-tuning whisper	2	726	September 20, 2023
Opinion: Training Argument Fine Tuning MLM RoBERTa	1	157	January 9, 2025
Reduced inference f1 score with QLoRA finetuned model	1	880	September 6, 2023
Optimal methods to monitor attention matrices when doing training/inference using BERT-type models	2	709	September 11, 2021
How to implement Key Query Layer Normalized Transformers/LLMs in Huggingface?	0	1225	February 18, 2023
How to load my own pretrained model to huggingface code	1	859	January 31, 2023
Fine tune gpt2 language model error - unexpected keyword argument 'cache_dir'	0	1205	September 8, 2020
How to get a model on patent data for question answering	1	851	October 15, 2021
Adapter Training - Merging Weights	0	1189	April 6, 2023
Pyannotate pipeline() not working	6	254	January 9, 2025
How to implement LoRA with Pytorch?	0	1185	November 30, 2023
Getting completely different performance when trying to write a custom model	1	837	June 9, 2022
TokenClassification pipeline doing batch processing over a sequence of already tokenised messages	1	831	July 6, 2022
Inference Endpoints creation	1	467	January 14, 2024
Sequence Length in Continued Pretraining (MLM) & Masking Strategies	0	1173	January 6, 2022
Starting my first pull request, but transformers tests stall	3	584	November 24, 2021
Parameters for evaluation loop of a Seq2SeqTrainer model	0	1165	November 26, 2021
I Made a simple CLI for playing with BLOOM	2	663	September 24, 2022
Issues with Pythia model finetuning	1	811	October 3, 2023
Using same instructions for fine-tuning: Is this bad for the model?	1	456	March 26, 2024
QLoRA memory requirement with 3B model loads GPU with 10GB of memory with 4bit quantization	0	1143	December 19, 2023
Meta-Llama-3-8B-Instruct: "max_new_tokens" is not working for /v1/chat/completions	1	808	July 2, 2024
Inference Endpoints 401 Error	2	370	July 15, 2024
Sagemaker model parallelism- running the model results in Maximum recursion limit	7	400	July 24, 2023
Advice Needed for Training an Imbalanced Dataset AI Model: lr, Epochs, and neuronal Architecture	3	565	November 27, 2023
Positional Encoding error, Protein Bert Model	2	652	October 25, 2020
Is native Pytorch training loop much slower than Trainer?	4	504	November 11, 2024
Finetuning using Raytune: Failed to unpickle serialized exception	0	200	November 21, 2024
Tokenizer from a GGUF file in Python?	1	788	May 6, 2024
How to increase tokens text generation API	1	753	August 28, 2022
Can run_clm.py do early stopping?	2	614	August 25, 2022
GPU memory usage is twice (2x) what I calculated based on number of parameters and floating point precision	5	435	May 18, 2024
Does it ever make sense to finetune w fp32 if the base model was trained w fp16?	1	745	July 8, 2022
Llamma index Saving and Loading	1	743	January 2, 2024
Embeddings of added words	1	740	September 9, 2022
Experience with and extending LLM for software engineering	4	470	August 15, 2024
PydanticUserError: The `__modify_schema__` method is not supported in Pydantic v2. Use `__get_pydantic_json_schema__` instead in class `SecretStr`	1	414	January 22, 2025
Trainer with fp8 - what to use in accel CLI vs. TrainingArguments	1	731	April 24, 2024
Fine-tuning with LoRA; can't learn	0	1033	May 7, 2023
Forward and reverse detokinizing	1	728	December 26, 2021
How to add EOS when training T5?	1	129	October 21, 2024
Pipelines, Whisper and how to set parameters	1	724	January 12, 2024
Nested named entity recognition	2	587	March 19, 2024
Loading two models onto two gpus	0	1015	March 31, 2023