AutoTrain error with Sequential data on evaluation loop
|
|
3
|
308
|
March 10, 2024
|
Using TFBertTokenizer with tf.data.Dataset
|
|
3
|
288
|
March 10, 2024
|
Merging two models
|
|
1
|
668
|
March 9, 2024
|
Jax and flax version used for the new gemma models
|
|
1
|
277
|
March 9, 2024
|
How to pass input to a Reward Model and make sense of its output?
|
|
1
|
383
|
March 8, 2024
|
Has anyone come across BERT fine-tuned for CLM task?
|
|
0
|
84
|
March 8, 2024
|
Deepspeed inference stage 3 + quantization
|
|
0
|
972
|
March 8, 2024
|
Custom tokenizer: finetune model or retrain model?
|
|
1
|
888
|
March 8, 2024
|
How to Decode InputIDs back to String in LayoutLMV3
|
|
2
|
1350
|
March 8, 2024
|
Fine-tunning llama2 with multiple GPU hugging face trainer
|
|
8
|
3347
|
March 7, 2024
|
No Simple way to add a ValueHead on top of existing HuggingFace Model while Preserving all PreTrainedModel Functionalities?
|
|
0
|
154
|
March 7, 2024
|
Batch_size, seq_length = input_shape ValueError: too many values to unpack (expected 2) Transformer Sentence Similarity Classification
|
|
16
|
1115
|
March 8, 2024
|
How does compute/resource allocation work for hyperparam search?
|
|
0
|
105
|
March 7, 2024
|
Auto Model for Sequence Classification take more than 20 minutes to classify a single sequence
|
|
3
|
244
|
March 7, 2024
|
We are facing the above error, i give code, please debug the code to correct manner
|
|
0
|
237
|
March 7, 2024
|
huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or ‘-’, '', ‘.’, ‘–’ and ‘…’ are forbidden, ‘-’ and ‘.’ cannot start or end the name, max length is 96:
|
|
0
|
2389
|
March 7, 2024
|
Compute_metrics do not find tokenizer (whisper finetuning)
|
|
1
|
314
|
March 6, 2024
|
How to train my model on multiple GPU
|
|
2
|
1937
|
March 6, 2024
|
Saving checkpoint is too slow with deepspeed
|
|
5
|
2757
|
March 6, 2024
|
CUDA out of memory on multi-GPU
|
|
1
|
2629
|
March 6, 2024
|
Extracting logits from vision language models at inference time
|
|
0
|
146
|
March 6, 2024
|
Training arguments modification and tuning
|
|
0
|
209
|
March 5, 2024
|
How to increase the width of hidden linear layers in Mistral 7B model?
|
|
1
|
284
|
March 5, 2024
|
Self-attention extraction from Long T5
|
|
0
|
243
|
March 5, 2024
|
SDPA attention in e.g. Llama does not use fused accelerations
|
|
0
|
821
|
March 5, 2024
|
Fine-tuning for Specific Medical Domains to Reduce Loss Stagnation
|
|
0
|
298
|
March 5, 2024
|
Fine-tuning LLM for regression yields low loss during training but not in inference?
|
|
2
|
4412
|
March 4, 2024
|
Challenges Achieving Satisfactory Accuracy in Fine-Tuning RoBERTa on a Custom Masked Token Prediction Dataset
|
|
2
|
296
|
March 4, 2024
|
Transformer pipeline load local pipeline
|
|
8
|
8626
|
March 4, 2024
|
Minimal OS Linux requirements to run transformers
|
|
0
|
178
|
March 4, 2024
|