Classifier Dropout for *DecoderModel*ForSequenceClassification Classes
|
|
0
|
4
|
October 25, 2024
|
Impossible to train a model using both bf16 mixed precision training and torch compile, RuntimeError: expected mat1 and mat2 to have the same dtype
|
|
5
|
40
|
October 25, 2024
|
Different metrics score between when training and when merge lora adapter testing
|
|
1
|
15
|
October 25, 2024
|
Backend low level kernel libraries used in Transformers
|
|
0
|
9
|
October 25, 2024
|
What the tokens are cross attentions output for?
|
|
1
|
261
|
October 25, 2024
|
Problem with returning decoder cross attentions through generate function
|
|
0
|
8
|
October 25, 2024
|
No benefit from turning on gradient_checkpointing: True
|
|
1
|
121
|
October 24, 2024
|
T5 variants return Training Loss 0 and Validation loss nan while fine tuning
|
|
7
|
4667
|
October 24, 2024
|
Load frozen layers from one checkpoint and new layers from second checkpoint?
|
|
0
|
29
|
October 23, 2024
|
Padding side in instruction fine-tuning using SFTT
|
|
0
|
21
|
October 23, 2024
|
Image analysis and comparison of objects with the database
|
|
2
|
25
|
October 22, 2024
|
valueError: Supplied state dict for layers does not contain `bitsandbytes__*` and possibly other `quantized_stats`(when load saved quantized model)
|
|
2
|
34
|
October 22, 2024
|
Multi-gpu huggingface training using trl
|
|
0
|
25
|
October 22, 2024
|
How to cache common instruction prompt
|
|
14
|
465
|
October 22, 2024
|
Model loading gets stuck when calling "from_pretrained"
|
|
1
|
14
|
October 21, 2024
|
Difference in model prediction before saving and after loafing
|
|
2
|
181
|
October 21, 2024
|
Storing and loading KV cache
|
|
6
|
435
|
October 21, 2024
|
Is There a Way to Improve Memory Usage When Using Identical `past_key_values` for All Samples in a Batch?
|
|
3
|
331
|
October 21, 2024
|
New data on same task - fine-tuning or adapter?
|
|
0
|
12
|
October 21, 2024
|
Calculating loss twice but return two different values
|
|
1
|
9
|
October 21, 2024
|
Sequential Prefilling w/ Mamba
|
|
0
|
17
|
October 21, 2024
|
Error Using Pydantic with LangChain and local model by Hugging Face for Structured Output
|
|
1
|
392
|
October 20, 2024
|
How to parallel infer multiple input sentences with beam search = 4?
|
|
0
|
8
|
October 20, 2024
|
Trying to understand system prompts with Llama 2 and transformers interface
|
|
9
|
36086
|
October 19, 2024
|
Reproducible model between SetFit Versions?
|
|
2
|
29
|
October 18, 2024
|
Unable to achieve better performance with transformer than LSTM
|
|
0
|
370
|
October 17, 2024
|
@huggingface/transformers library won't run under node-alpine
|
|
0
|
11
|
October 17, 2024
|
Summary of the meeting in Indonesian
|
|
0
|
10
|
October 17, 2024
|
Machine learning
|
|
1
|
19
|
October 17, 2024
|
First time to AI - apps. Do I need a GPU in order to run a model using transformers?
|
|
1
|
39
|
October 17, 2024
|