A custom trainer for multi-task learning?
|
|
1
|
820
|
September 18, 2024
|
How to inject condition into causal model correctly
|
|
0
|
43
|
September 18, 2024
|
Calculate tokens per second while fine-tuning llm?
|
|
0
|
123
|
September 17, 2024
|
Training loop for LoRA
|
|
3
|
248
|
September 18, 2024
|
How to extract tables from images using Hugging Face models?
|
|
1
|
345
|
September 17, 2024
|
Speed up whisper batched inference
|
|
0
|
168
|
September 16, 2024
|
Error with GPTQ for distilbert/distilbert-base-cased
|
|
0
|
20
|
September 16, 2024
|
How to run the Causal Language modelling example on multiple gpu?
|
|
0
|
77
|
September 16, 2024
|
How to set gpu device for hugging trainer?
|
|
1
|
1095
|
September 16, 2024
|
Trainer object high memory usage on Google Cloud Platform Workbench instance
|
|
0
|
30
|
September 16, 2024
|
Problems with trainer.compute_metrics
|
|
1
|
211
|
September 15, 2024
|
T5 models have non-deterministic outputs even after disabling dropout
|
|
9
|
166
|
September 15, 2024
|
Adapter for facebook/sam-vit-huge
|
|
0
|
8
|
September 14, 2024
|
How to finetune Microsoft Phi-2 on Wikitext2 dataset
|
|
2
|
83
|
September 14, 2024
|
Corpus for pre train bert base chinese
|
|
1
|
23
|
September 14, 2024
|
Multiple texts as inputs to Transformers models
|
|
9
|
9953
|
September 13, 2024
|
Impact of resuming from a checkpoint vs training/finetuning from the start
|
|
0
|
23
|
September 12, 2024
|
Why use tokenizer in Trainer with Tokenized Data
|
|
4
|
607
|
September 12, 2024
|
Transformer Trainer no response when evaluate with compute_metrics
|
|
1
|
145
|
September 12, 2024
|
How to calculate tokens per second while fine-tuning llm?
|
|
1
|
1623
|
September 12, 2024
|
Trainer() shows no log for validation loss when using PEFT
|
|
2
|
531
|
September 11, 2024
|
Is it possible to add L1-regularization in Huggingface Trainer?
|
|
2
|
256
|
September 11, 2024
|
Can't train Mamba2 with FP16 (Mamba2ForCausalLM)
|
|
4
|
47
|
September 10, 2024
|
Trainer API for Model Parallelism on Multiple GPUs
|
|
5
|
4088
|
September 10, 2024
|
Defog sqlcoder model download
|
|
4
|
32
|
September 10, 2024
|
ConvNextImageProcessor weird resize behaviour when input image is 224x224
|
|
2
|
45
|
September 10, 2024
|
Modeling_bert use next-token prediction?
|
|
4
|
147
|
September 10, 2024
|
[DONUT] Typo errors - Document parsing
|
|
1
|
516
|
September 10, 2024
|
How do LLMs identify generation start point during fine-tuning?
|
|
5
|
99
|
September 9, 2024
|
Are dropout layers activated when calling model.generate()?
|
|
2
|
66
|
September 7, 2024
|