Clarification on the attention_mask
|
|
4
|
24229
|
May 3, 2024
|
Retraining pre-trained NER model with new data samples
|
|
1
|
409
|
May 3, 2024
|
PerceiverIO Output Query Array Doubts
|
|
1
|
129
|
May 3, 2024
|
How do I avoid LLM rambling?
|
|
0
|
378
|
May 3, 2024
|
Tensor parallel in Pytorch 2.3
|
|
0
|
206
|
May 2, 2024
|
Convert Pytorch Model to Huggingface Transformer?
|
|
2
|
11074
|
May 2, 2024
|
Customizing T5 tokenizer for finetuning
|
|
1
|
638
|
May 2, 2024
|
Node: 'model/swin_transformer/tf_swin_model/swin/encoder/layers.1/blocks.0/Reshape_33' Input to reshape is a tensor with 3763200 values, but the requested shape requires a multiple of 20384
|
|
0
|
101
|
May 2, 2024
|
pre-train_BERT for a specific corpus
|
|
0
|
74
|
May 2, 2024
|
Inference API offline model limit
|
|
1
|
923
|
May 2, 2024
|
A model to extract email text body from html code
|
|
4
|
636
|
May 2, 2024
|
T5 generates repetitive sentences
|
|
3
|
782
|
May 2, 2024
|
Training multiple times in one script
|
|
0
|
205
|
May 2, 2024
|
Setting "num_beams" and using "past_key_values" when calling .generate()
|
|
0
|
224
|
May 2, 2024
|
I cannot find the code that transformers trainer model_wrapped by deepspeed , i can find the theory about model_wrapped was wraped by DDP(Deepspeed(transformer model )) ,but i only find the code transformers model wrapped by ddp, where is the deepspeed wr
|
|
1
|
137
|
May 1, 2024
|
How to create a new Hugging face model by using already available hugging face models
|
|
2
|
152
|
May 1, 2024
|
Fine tuning gguf models?
|
|
1
|
1440
|
April 30, 2024
|
meta-llama/Meta-Llama-3-8B is giving empty responses when I use with transformers
|
|
0
|
264
|
April 30, 2024
|
Need to set re_entrant to true with latest transformers
|
|
1
|
1259
|
April 29, 2024
|
How to pass the api token using transformers candle (rust)?
|
|
1
|
169
|
April 29, 2024
|
Learning rate for the `Trainer` in a multi gpu setup
|
|
4
|
644
|
April 29, 2024
|
Script stops upon setting the model
|
|
0
|
96
|
April 29, 2024
|
How to convert natural languages into vec?
|
|
2
|
95
|
April 29, 2024
|
Negative Kl values during PPO training (TRL library)
|
|
0
|
349
|
April 28, 2024
|
DPOTrainer and sequence length
|
|
0
|
121
|
April 27, 2024
|
ValueError: attention_mask is missing in the dataloader
|
|
0
|
243
|
April 27, 2024
|
Training Reformer model from scratch with deepspeed - backprop error
|
|
0
|
100
|
April 26, 2024
|
What should I do if I want to use local dataset xsum in this project
|
|
0
|
93
|
April 26, 2024
|
Train T5 from scratch
|
|
4
|
3559
|
April 26, 2024
|
(Audio-to-audio models) Should I use 2 models sequentially or create 1 model for attempting to make a music to music model?
|
|
0
|
109
|
April 26, 2024
|