Sampling strategies
|
|
1
|
515
|
April 4, 2023
|
Mapping text that describes connected devices to a JSON object with chosen shape
|
|
2
|
417
|
December 19, 2023
|
Hyperparameter-Search while adding Special tokens
|
|
1
|
508
|
January 28, 2024
|
Role of past_key_value in self attention
|
|
0
|
714
|
November 23, 2021
|
Unpacking transformer's trainer.eval() to see every example's output, loss
|
|
4
|
319
|
April 9, 2024
|
Having issues finetuning a Bert model pretrained from scratch on downstream task (GLUE Dataset)!
|
|
0
|
712
|
March 26, 2022
|
ValueError: {'code': None, 'message': 'ModelMetaclass object argument after ** must be a mapping, not str'
|
|
4
|
56
|
December 27, 2024
|
Accelerate: 'RobertaModel' object has no attribute 'roberta'
|
|
1
|
496
|
August 29, 2023
|
Text Web UI Generation Blanks (goes Black) on a Character
|
|
0
|
700
|
September 17, 2023
|
Trying to recreate `model.greedy_search()` for custom decoding of LLM output, but I am getting a different decoded output
|
|
3
|
349
|
February 8, 2024
|
Is iterative training advisable?
|
|
3
|
349
|
December 27, 2023
|
QA model with human like answers
|
|
1
|
489
|
February 4, 2023
|
Qlora - 8 bit quantization using bitsandbytes gives error for owl-vit model
|
|
1
|
488
|
April 12, 2024
|
XLMR-large not converging on Paws-X paraphrase dataset but mbert does
|
|
1
|
488
|
May 3, 2021
|
Load pre-trained models inside containerized pipeline for multi-lingual translation
|
|
0
|
690
|
November 16, 2022
|
DownloadAndLoadFlorence2Model 401 Client Error
|
|
1
|
273
|
February 10, 2025
|
Lack of pipeline parallelism examples for image-based transformers
|
|
3
|
61
|
October 7, 2024
|
Smart Batching - speech up Bert finetune
|
|
0
|
682
|
March 15, 2021
|
LLM Inference hosting issue
|
|
2
|
392
|
December 4, 2023
|
Approach for Creating a Real-Time Speech-to-Speech Model with Emotions, Laughter, and Crying—aka "The Perfect Voice Changer"
|
|
1
|
272
|
February 24, 2025
|
Identifying most useful domain-specific tokens for adding to the existing tokenizer
|
|
1
|
478
|
February 2, 2024
|
TRL + PPO + Using Conditioned Reference Model
|
|
3
|
60
|
January 27, 2025
|
Running low on GPU memory on a cluster with ESM2 lowest config
|
|
2
|
388
|
December 5, 2023
|
AttributeError: 'LangchainEmbedding' object has no attribute '_langchain_embedding;
|
|
0
|
672
|
June 21, 2024
|
TPU VM training - each process loads the dataset
|
|
1
|
472
|
July 29, 2022
|
How to set input to validate of T5 Model
|
|
1
|
467
|
February 21, 2023
|
DataCollator not padding as expected
|
|
0
|
660
|
August 17, 2022
|
Bert Multi-lingual fine-tuning for multilabel classification
|
|
0
|
659
|
January 25, 2022
|
How to use multiple context indexes with LLM
|
|
0
|
659
|
July 6, 2023
|
Seq2Seq Learning rate
|
|
2
|
379
|
March 6, 2024
|
Correct way to define outputs for an Image Model
|
|
0
|
653
|
July 17, 2022
|
Tips on structured data translation
|
|
0
|
651
|
November 18, 2021
|
Initializing modelingBert as an identity transformation
|
|
0
|
642
|
December 22, 2021
|
What's a low enough perplexity value
|
|
1
|
255
|
October 23, 2024
|
Chat bot for Question and answer in csv ,all open source models
|
|
0
|
641
|
April 4, 2024
|
Differentiable Softmax and Argmax Problem
|
|
0
|
639
|
December 30, 2022
|
Inference with Multi-Step Reasoning
|
|
0
|
638
|
March 23, 2023
|
Explanation of the default "auto" values for DeepSpeed stage 3?
|
|
1
|
450
|
August 22, 2023
|
Best practice for finetune LLM
|
|
0
|
636
|
June 21, 2023
|
Different sentiments when texts processed in batches vs singles
|
|
1
|
447
|
July 3, 2022
|
Mlflow with Hugging Face
|
|
0
|
632
|
February 14, 2023
|
StoppingCriteria "scores" always None
|
|
1
|
445
|
July 7, 2022
|
HF Inference Usage via organization
|
|
4
|
50
|
April 3, 2025
|
Stochastic Sampling with Trainer.evaluate() Logits
|
|
3
|
314
|
May 6, 2024
|
Removing .bin files from local repo but not from hub
|
|
2
|
361
|
July 6, 2023
|
Determinism in sequence classification
|
|
2
|
361
|
July 1, 2021
|
How can I use keyBERT using huggingface inference API?
|
|
1
|
442
|
June 19, 2021
|
How does SFTT trainer behave during evaluation?
|
|
0
|
111
|
October 23, 2024
|
How to extract attention gradients in bert
|
|
0
|
622
|
April 16, 2022
|
How did you create AWS API Gateway w/o 30s timeout?
|
|
0
|
619
|
April 5, 2021
|