Create multiple dataset subsets at the same time
|
|
0
|
105
|
December 8, 2024
|
Transformers Pretrained model import
|
|
3
|
712
|
December 9, 2024
|
Why does PALIGemma use 256 tokens for a 224x224 image
|
|
0
|
33
|
December 8, 2024
|
CUDA error: device-side assert triggered on device_map="auto"
|
|
4
|
1638
|
December 8, 2024
|
I need some recommendation or advice on a fast vqa (visual question answering) model. I really don't know how to look for them
|
|
0
|
83
|
December 7, 2024
|
Decision Transformer for Discrete action
|
|
5
|
423
|
December 7, 2024
|
Whisper medium finetuning RTX 4090 mostly stays idle
|
|
5
|
276
|
December 7, 2024
|
Pretrain model not accepting optimizer
|
|
30
|
4771
|
December 7, 2024
|
And torch.cuda.empty_cache() fail?
|
|
2
|
17
|
December 9, 2024
|
Max Seq Lengths
|
|
1
|
568
|
December 6, 2024
|
Does setting max_seq_length to a too large number for fine tuning LLM using SFTTrainer affects model training?
|
|
1
|
1890
|
December 6, 2024
|
How to use I-JEPA for image classficiation
|
|
4
|
1968
|
December 6, 2024
|
Looking for a Tiny LLM (max 1.5GB) â Need Advice
|
|
6
|
8330
|
December 6, 2024
|
Improving precision of ViT for image classification
|
|
0
|
79
|
December 6, 2024
|
Tumblr Ãcretsiz Yönlendirme Scripti
|
|
2
|
44
|
December 9, 2024
|
BERT Model - OSError
|
|
3
|
5005
|
December 6, 2024
|
Using detr with custom backbone
|
|
3
|
641
|
December 6, 2024
|
Evaluation metrics for BERT-like LMs
|
|
4
|
4623
|
December 6, 2024
|
Pretraining T5 from scratch using MLM
|
|
1
|
397
|
December 6, 2024
|
Albert pre-train convergence problem
|
|
1
|
634
|
December 6, 2024
|
LLMA model using Hugging Face: Getting no access
|
|
1
|
116
|
December 6, 2024
|
Why the memory usage is higher than expected when loading nvidia/NV-Embed-v2 model with FP16 precision?
|
|
0
|
96
|
December 6, 2024
|
TextIteratorStreamer compatibility with batch processing
|
|
3
|
1444
|
December 6, 2024
|
Can we run custom quantized llama3-8b on Npu?
|
|
0
|
62
|
December 6, 2024
|
Fine tune "meta-llama/Llama-2-7b-hf" Bug:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward)
|
|
15
|
189
|
December 6, 2024
|
Need a Model for Extracting Relevant Keywords for Given Titles
|
|
1
|
455
|
December 6, 2024
|
Why does moving ML model initialization into a function prevent GPU OOM errors when del, gc.collect(), and torch.cuda.empty_cache() fail?
|
|
0
|
99
|
December 5, 2024
|
DDP error for LoRA SFT
|
|
1
|
189
|
December 5, 2024
|
Trainer is not saving all layers when fine-tuning Llama with P-Tuning
|
|
0
|
45
|
December 5, 2024
|
Pretrained Models to Heroku Production Environment
|
|
5
|
1830
|
July 10, 2020
|