Intermediate

Topic	Replies	Views	Activity
Sampling strategies	1	515	April 4, 2023
Mapping text that describes connected devices to a JSON object with chosen shape	2	417	December 19, 2023
Hyperparameter-Search while adding Special tokens	1	508	January 28, 2024
Role of past_key_value in self attention	0	714	November 23, 2021
Unpacking transformer's trainer.eval() to see every example's output, loss	4	319	April 9, 2024
Having issues finetuning a Bert model pretrained from scratch on downstream task (GLUE Dataset)!	0	712	March 26, 2022
ValueError: {'code': None, 'message': 'ModelMetaclass object argument after ** must be a mapping, not str'	4	56	December 27, 2024
Accelerate: 'RobertaModel' object has no attribute 'roberta'	1	496	August 29, 2023
Text Web UI Generation Blanks (goes Black) on a Character	0	700	September 17, 2023
Trying to recreate `model.greedy_search()` for custom decoding of LLM output, but I am getting a different decoded output	3	349	February 8, 2024
Is iterative training advisable?	3	349	December 27, 2023
QA model with human like answers	1	489	February 4, 2023
Qlora - 8 bit quantization using bitsandbytes gives error for owl-vit model	1	488	April 12, 2024
XLMR-large not converging on Paws-X paraphrase dataset but mbert does	1	488	May 3, 2021
Load pre-trained models inside containerized pipeline for multi-lingual translation	0	690	November 16, 2022
DownloadAndLoadFlorence2Model 401 Client Error	1	273	February 10, 2025
Lack of pipeline parallelism examples for image-based transformers	3	61	October 7, 2024
Smart Batching - speech up Bert finetune	0	682	March 15, 2021
LLM Inference hosting issue	2	392	December 4, 2023
Approach for Creating a Real-Time Speech-to-Speech Model with Emotions, Laughter, and Crying—aka "The Perfect Voice Changer"	1	272	February 24, 2025
Identifying most useful domain-specific tokens for adding to the existing tokenizer	1	478	February 2, 2024
TRL + PPO + Using Conditioned Reference Model	3	60	January 27, 2025
Running low on GPU memory on a cluster with ESM2 lowest config	2	388	December 5, 2023
AttributeError: 'LangchainEmbedding' object has no attribute '_langchain_embedding;	0	672	June 21, 2024
TPU VM training - each process loads the dataset	1	472	July 29, 2022
How to set input to validate of T5 Model	1	467	February 21, 2023
DataCollator not padding as expected	0	660	August 17, 2022
Bert Multi-lingual fine-tuning for multilabel classification	0	659	January 25, 2022
How to use multiple context indexes with LLM	0	659	July 6, 2023
Seq2Seq Learning rate	2	379	March 6, 2024
Correct way to define outputs for an Image Model	0	653	July 17, 2022
Tips on structured data translation	0	651	November 18, 2021
Initializing modelingBert as an identity transformation	0	642	December 22, 2021
What's a low enough perplexity value	1	255	October 23, 2024
Chat bot for Question and answer in csv ,all open source models	0	641	April 4, 2024
Differentiable Softmax and Argmax Problem	0	639	December 30, 2022
Inference with Multi-Step Reasoning	0	638	March 23, 2023
Explanation of the default "auto" values for DeepSpeed stage 3?	1	450	August 22, 2023
Best practice for finetune LLM	0	636	June 21, 2023
Different sentiments when texts processed in batches vs singles	1	447	July 3, 2022
Mlflow with Hugging Face	0	632	February 14, 2023
StoppingCriteria "scores" always None	1	445	July 7, 2022
HF Inference Usage via organization	4	50	April 3, 2025
Stochastic Sampling with Trainer.evaluate() Logits	3	314	May 6, 2024
Removing .bin files from local repo but not from hub	2	361	July 6, 2023
Determinism in sequence classification	2	361	July 1, 2021
How can I use keyBERT using huggingface inference API?	1	442	June 19, 2021
How does SFTT trainer behave during evaluation?	0	111	October 23, 2024
How to extract attention gradients in bert	0	622	April 16, 2022
How did you create AWS API Gateway w/o 30s timeout?	0	619	April 5, 2021