Fine-tuning with Different Model Heads
|
|
4
|
753
|
April 30, 2024
|
Creating sharded IterableDataset from a list of IterableDatasets?
|
|
2
|
543
|
July 2, 2024
|
An error i ve been trying to fix for days now
|
|
4
|
419
|
November 19, 2024
|
Setting seed within model.generate()
|
|
0
|
296
|
November 11, 2024
|
Fine tuning RoBerta got an unexpected keyword argument 'labels'
|
|
2
|
957
|
May 1, 2024
|
How can I use evaluate's perplexity metric on a model that's already loaded?
|
|
0
|
1656
|
July 28, 2023
|
Cannot Merge Lora weights back to the base model
|
|
8
|
305
|
October 29, 2024
|
FAQ question generation and answering using few shot learning
|
|
1
|
1144
|
March 14, 2023
|
Using a fixed vocabulary?
|
|
2
|
929
|
November 8, 2021
|
How do I deploy Gradio app with Kubernetes?
|
|
0
|
1578
|
October 15, 2022
|
Train loss goes to zero after some epochs
|
|
0
|
280
|
August 11, 2023
|
Interpreting train_loss/val_loss Plot
|
|
3
|
786
|
March 24, 2023
|
WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time when adding rouge-score
|
|
0
|
1572
|
February 14, 2022
|
One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior
|
|
0
|
1568
|
February 21, 2021
|
Weird output from model.generate()
|
|
1
|
1106
|
September 21, 2023
|
Fine tuning facebook/bart-large-mnli zeroshot classifier
|
|
2
|
901
|
June 30, 2023
|
Open-sourcing better cross-encoders for STILTS and better IR?
|
|
2
|
900
|
October 9, 2021
|
Network is Unreachable Error
|
|
0
|
1541
|
July 26, 2022
|
Generating sentence embeddings from pretrained transformers model
|
|
1
|
1089
|
January 22, 2021
|
Probsparse_attention in Informer
|
|
3
|
756
|
March 31, 2023
|
BERT Split NER Labeling
|
|
1
|
1052
|
December 7, 2021
|
A new dataset for multi-label text classification
|
|
1
|
1039
|
September 30, 2021
|
Which weights change when fine-tunning a pre-trained model?
|
|
3
|
733
|
June 11, 2024
|
Constrain output format from beam search in Donut doc classification
|
|
4
|
647
|
September 30, 2022
|
Generating [PAD] tokens during GPT2 inference
|
|
0
|
1423
|
August 22, 2022
|
`serving` signature in TensorFlow Serving blogpost
|
|
2
|
820
|
August 9, 2021
|
How to understand the answer_start parameter of Squad dataset for training BERT-QA model + practical implications for creating custom dataset?
|
|
1
|
1001
|
September 1, 2023
|
Get well adjusted confidence scores from similarity of CLIP encodings
|
|
1
|
558
|
July 25, 2024
|
Using XLA fast text generation with Pegasus models
|
|
5
|
569
|
August 25, 2022
|
Training for GPTQ, possible?
|
|
1
|
982
|
October 24, 2023
|
Sampling: what's the secret sauce?
|
|
2
|
797
|
August 22, 2022
|
Primer on Fine Tuning Text generation models (like GPT)
|
|
0
|
1380
|
November 14, 2022
|
Is there any way to avoid CPU bottlenecks when doing single prompt inference?
|
|
1
|
968
|
June 12, 2023
|
Trouble loading checkpoint shards for microsoft/Phi-3-mini-4k-instruct
|
|
1
|
966
|
May 5, 2024
|
Using .generate with TAPAS as encoder in EncoderDecoder
|
|
4
|
610
|
January 18, 2022
|
Transformer's output as input to other model
|
|
4
|
610
|
March 27, 2021
|
Training Loss 0.0000 and Validation Loss nan
|
|
2
|
140
|
March 12, 2025
|
How do Sequence to Sequence architectures (BART, LED) learn the end of generation?
|
|
2
|
781
|
February 14, 2022
|
Train Roberta from scratch for custom dataset
|
|
1
|
945
|
May 2, 2023
|
Read data of pdf or just image format as a part of promt
|
|
0
|
1333
|
May 29, 2023
|
Identifying and getting right embeddings from the fine tuned BERT on domain specific data
|
|
0
|
1328
|
September 8, 2021
|
TGI with guidance generates weird output when asked to answer in a "structured" way
|
|
3
|
117
|
February 17, 2025
|
Giving attention mask to ppo_trainer
|
|
0
|
233
|
May 4, 2024
|
Decicoder finetune error: understanding naive_attention_prefill
|
|
1
|
520
|
September 17, 2023
|
Save custom transformer as PreTrainedModel
|
|
1
|
924
|
September 7, 2021
|
🤪 Deploying huggingface models to Chai
|
|
1
|
513
|
April 29, 2021
|
Interpreting logs by the trainer
|
|
1
|
910
|
May 19, 2023
|
Invalid image format
|
|
2
|
418
|
October 29, 2024
|
Mistral - Sentence classification - mat1 and mat2 shapes cannot be multiplied
|
|
4
|
574
|
November 5, 2024
|
Distributed inference for datasets created on the fly
|
|
3
|
641
|
October 10, 2023
|