Forcing BERT hidden dimension size
|
|
1
|
1160
|
December 19, 2023
|
Loading checkpoint shards very slow
|
|
1
|
7721
|
December 19, 2023
|
How to perform training on CPU +GPU offloading?
|
|
1
|
1644
|
December 19, 2023
|
How to deploy larger model inference on multiple machine with multiple GPUļ¼
|
|
1
|
2605
|
December 19, 2023
|
I was trying to fine tune llama2 for specific usecase.In that after fine tuning when I'm trying load fine tune model locally I'm getting error below mentioned
|
|
1
|
880
|
December 19, 2023
|
Generating text word by word
|
|
2
|
909
|
December 19, 2023
|
Logits function too slow
|
|
0
|
226
|
December 19, 2023
|
Time series Prediction: inference process
|
|
1
|
1800
|
December 19, 2023
|
Train a simple Pytoch model with transformers Trainer
|
|
0
|
126
|
December 19, 2023
|
Trainer: How can I log model outputs besides loss?
|
|
0
|
272
|
December 18, 2023
|
Replacing the decoder of an xxxEncoderDecoderModel
|
|
2
|
1708
|
December 16, 2023
|
Error - RuntimeError
|
|
0
|
734
|
December 15, 2023
|
Setting target language codes in mT5
|
|
0
|
146
|
December 15, 2023
|
Unable to inference in 8bit mode: 'NoneType' object has no attribute 'device'
|
|
4
|
2288
|
December 14, 2023
|
How to merge multiple LoRA back to base model?
|
|
0
|
668
|
December 14, 2023
|
Embeddings from llama2
|
|
6
|
12408
|
December 13, 2023
|
Total_flos vs C = 6 * N * D
|
|
1
|
800
|
December 13, 2023
|
Do we really need a very large dataset to train GPS?
|
|
0
|
114
|
December 13, 2023
|
Unable to load ALMA-13B model from HF
|
|
0
|
171
|
December 13, 2023
|
Convert a bert model trained in pytorch into a Hugginface Model
|
|
0
|
490
|
December 12, 2023
|
Converting CLIP to CoreML
|
|
13
|
3252
|
December 12, 2023
|
Fine tuning LayoutLM: Gradients not updated error in training
|
|
0
|
215
|
December 12, 2023
|
How do I disable gradient syncing between workers in distributed training when using trainer?
|
|
0
|
103
|
December 11, 2023
|
Question about the output of the decision transformer
|
|
0
|
154
|
December 11, 2023
|
Fine tuned Mistral 7B inference issue for >4k context length token with transformer 4.35+
|
|
0
|
561
|
December 11, 2023
|
How to calculate the price of hosting transformers for semantic search
|
|
2
|
296
|
December 10, 2023
|
Huggingface Data Collator: Index put requires the source and destination dtypes match, got Float for the destination and Long for the source
|
|
10
|
2478
|
December 10, 2023
|
Llama2-70b SafetensorError: Error while deserializing header: HeaderTooLarge
|
|
0
|
1181
|
December 9, 2023
|
ZERO loss while finetuning Llama2 usin SFT trainer and the use of collator
|
|
1
|
1917
|
December 9, 2023
|
Summary length for knowledge graphs vs long documents
|
|
0
|
136
|
December 9, 2023
|