How to calculate tokens per second while fine-tuning llm?
|
|
1
|
1244
|
September 12, 2024
|
Transformer Trainer no response when evaluate with compute_metrics
|
|
0
|
3
|
September 11, 2024
|
The checkpoint you are trying to load has model type `gemma2` but Transformers does not recognize this architecture
|
|
5
|
2071
|
September 11, 2024
|
Trainer() shows no log for validation loss when using PEFT
|
|
2
|
356
|
September 11, 2024
|
ValueError: Input sequence length (100) doesn't match model configuration (32)
|
|
0
|
4
|
September 11, 2024
|
Is it possible to add L1-regularization in Huggingface Trainer?
|
|
1
|
10
|
September 11, 2024
|
The effect of padding_side
|
|
10
|
6967
|
September 10, 2024
|
Can't train Mamba2 with FP16 (Mamba2ForCausalLM)
|
|
4
|
13
|
September 10, 2024
|
Trainer API for Model Parallelism on Multiple GPUs
|
|
5
|
3264
|
September 10, 2024
|
Defog sqlcoder model download
|
|
4
|
5
|
September 10, 2024
|
ConvNextImageProcessor weird resize behaviour when input image is 224x224
|
|
2
|
16
|
September 10, 2024
|
Modeling_bert use next-token prediction?
|
|
3
|
6
|
September 10, 2024
|
[DONUT] Typo errors - Document parsing
|
|
1
|
468
|
September 10, 2024
|
Trainer.train() runs for long and appears to be stuck. How do I know that it's processing and not in loop
|
|
1
|
6
|
September 9, 2024
|
How do LLMs identify generation start point during fine-tuning?
|
|
5
|
14
|
September 9, 2024
|
Are dropout layers activated when calling model.generate()?
|
|
2
|
9
|
September 7, 2024
|
Multi-GPU Operation mistralai/Mistral-Large-Instruct-2407
|
|
0
|
4
|
September 7, 2024
|
Fitting huge models on multiple nodes
|
|
0
|
8
|
September 6, 2024
|
Getting sslcertverificationerror exception
|
|
0
|
6
|
September 6, 2024
|
How to fine-tune "openai-gpt" model for sequence classification?
|
|
3
|
1186
|
September 5, 2024
|
When using greedy decoding on a causal LM, how does `generate` handle tie-breaking between logits?
|
|
0
|
5
|
September 5, 2024
|
Why does `generate` in `LlamaForCausalLM` give me _slightly_ lower logits than __call__?
|
|
1
|
6
|
September 5, 2024
|
"What’s the Difference Between max_length and max_new_tokens?"
|
|
0
|
8
|
September 5, 2024
|
How to continue training with HuggingFace Trainer?
|
|
4
|
6663
|
September 5, 2024
|
Do not save runs (TensorBoard) after the epoch has ended
|
|
0
|
2
|
September 5, 2024
|
Confused about max_length and max_new_tokens
|
|
7
|
27925
|
September 5, 2024
|
Llama-2 find answer in a transcript
|
|
0
|
2
|
September 5, 2024
|
How to continue training a model from where it left off?
|
|
0
|
6
|
September 5, 2024
|
How to use ViT MAE for image classification?
|
|
3
|
1705
|
September 4, 2024
|
Flash attention has no effect on inference
|
|
7
|
8325
|
September 4, 2024
|