Training Reformer model from scratch with deepspeed - backprop error
|
|
0
|
19
|
April 26, 2024
|
What should I do if I want to use local dataset xsum in this project
|
|
0
|
20
|
April 26, 2024
|
How to output loss from model.generate()?
|
|
15
|
4037
|
April 26, 2024
|
Train T5 from scratch
|
|
4
|
3029
|
April 26, 2024
|
(Audio-to-audio models) Should I use 2 models sequentially or create 1 model for attempting to make a music to music model?
|
|
0
|
18
|
April 26, 2024
|
Does generate's max_length influence training?
|
|
0
|
27
|
April 25, 2024
|
Finetuned State-space/mamba model not working on huggingface model
|
|
0
|
21
|
April 25, 2024
|
DPOTrainer consumes lots of VRAM
|
|
0
|
25
|
April 25, 2024
|
How to evaluate before first training step?
|
|
8
|
3617
|
April 25, 2024
|
Error to import transformers[torch] or accelerate -U
|
|
0
|
26
|
April 25, 2024
|
Getting error - trainer.train()
|
|
3
|
353
|
April 25, 2024
|
Prohibitively large RAM consumption on Trainer validation
|
|
2
|
74
|
April 24, 2024
|
ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices
|
|
9
|
19912
|
April 24, 2024
|
Multiple time fine-tuning VideoMAE model adding n class each time
|
|
0
|
30
|
April 24, 2024
|
How to not show the progress bar for evaluation only?
|
|
1
|
64
|
April 24, 2024
|
How to disable Huggingface Hub during Trainer saving of PEFT models?
|
|
2
|
76
|
April 24, 2024
|
Multivariate time-series transformer
|
|
0
|
33
|
April 24, 2024
|
TypeError: map() got an unexpected keyword argument 'num_proc'
|
|
0
|
38
|
April 24, 2024
|
Using Trainer class + 4/8 bit quantised model for prediction
|
|
0
|
40
|
April 24, 2024
|
Why the model loading of llama2 is so slow?
|
|
6
|
5993
|
April 24, 2024
|
Out of bounds Error in label conversion , two labels getting converted to 0 and 247
|
|
0
|
31
|
April 24, 2024
|
PerceiverIO Output Query Array Doubts
|
|
0
|
44
|
April 23, 2024
|
SSL Certificate Issue
|
|
5
|
10164
|
April 23, 2024
|
How to cluster words into semantic entities, when performing information extraction?
|
|
2
|
794
|
April 23, 2024
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
|
|
24
|
67288
|
April 23, 2024
|
Program hangs when creating a transformers.TrainingArguments object
|
|
2
|
262
|
April 23, 2024
|
Unable to open file 'model.bin' in model 'ct2fast_m2m100_418M'
|
|
1
|
172
|
April 22, 2024
|
Confusing Benchmark results Running whisper on 4080 Super vs A10 vs H100
|
|
0
|
51
|
April 22, 2024
|
How to finetune with a own private data and then build chatbot on that?
|
|
5
|
6906
|
April 22, 2024
|
Conversion from finetune m2m_100 model to huggingface format
|
|
0
|
42
|
April 22, 2024
|