How to evaluate before first training step?
|
|
10
|
6428
|
August 3, 2024
|
Loading a peft model which is saved on multiple nodes using sharded_state_dict?
|
|
0
|
32
|
August 2, 2024
|
Dynamically resizing input for Huggingface's generate()
|
|
0
|
30
|
August 2, 2024
|
Calibrate Probabilities for Transformers Classifier
|
|
0
|
64
|
August 2, 2024
|
Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference
|
|
2
|
1299
|
August 2, 2024
|
Accelerating inference for local HuggingFacePipeline of Llama3
|
|
0
|
87
|
August 1, 2024
|
Custom QA model over fitting
|
|
0
|
8
|
August 1, 2024
|
ValueError: cannot reshape array of size (GGUF)
|
|
4
|
638
|
July 31, 2024
|
Change Generation Config of Transformer Model without getting UserWarning
|
|
0
|
106
|
July 31, 2024
|
Trainer not passing all features of the dataset?
|
|
3
|
31
|
July 30, 2024
|
How to save the best trial's model using `trainer.hyperparameter_search`
|
|
5
|
2658
|
July 30, 2024
|
How does one create a custom hugging face model with a already working tokenizer?
|
|
1
|
954
|
July 29, 2024
|
How to load a model fine-tuned with QLoRA
|
|
2
|
6409
|
July 29, 2024
|
PEFT LoRA GPT-NeoX - Backward pass failing
|
|
7
|
7011
|
July 29, 2024
|
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
|
|
3
|
13409
|
July 29, 2024
|
Compatible Steamer with Beam Search
|
|
1
|
175
|
July 29, 2024
|
Transformers llama flash_attn_varlen_func questions
|
|
0
|
192
|
July 29, 2024
|
How to add a stop sequence to a Pipeline?
|
|
0
|
255
|
July 29, 2024
|
Compute_metrics() behaves strangely in distributed setting
|
|
0
|
45
|
July 28, 2024
|
Hyperparameter search with wandb
|
|
1
|
217
|
July 28, 2024
|
How to parameter efficient finetune Decoder in encoder-decoder model?
|
|
4
|
135
|
July 27, 2024
|
Error Loading Custom Transformers.js model from hugging face hub
|
|
1
|
613
|
July 27, 2024
|
Why Pipeline inferencing with CPU and pytorch for wav2vec only use 50% of cpu? and does chunk length impact the speed for model?
|
|
1
|
623
|
July 26, 2024
|
DeepSpeed error: a leaf Variable that requires grad is being used in an in-place operation
|
|
1
|
77
|
July 26, 2024
|
Adapter_model.safetensors size is very big
|
|
3
|
572
|
July 27, 2024
|
[T5] How to control the lenth of the generated summaries
|
|
0
|
32
|
July 26, 2024
|
A question about code on Mistral-7B attention
|
|
0
|
75
|
July 26, 2024
|
Epoch does not get updated
|
|
0
|
5
|
July 25, 2024
|
ReactCodeAgent - Local LLM
|
|
0
|
58
|
July 25, 2024
|
How to perform batch inference on GroundingDino model
|
|
2
|
562
|
July 25, 2024
|