Doing inference with FSDP during training affects checkpointing
|
|
1
|
399
|
October 31, 2024
|
Managing Memory for Agents 2.0
|
|
0
|
39
|
October 26, 2024
|
Prompting llama3.2 to answer questions
|
|
2
|
191
|
October 30, 2024
|
How to convert sentence-transformers/msmarco-distilbert-base-tas-b model to torchscript
|
|
0
|
40
|
October 30, 2024
|
How to create a config.json after saving a model
|
|
21
|
40466
|
October 30, 2024
|
meta-llama/Llama-3.2-11B-Vision-Instruct did not reply
|
|
10
|
12897
|
October 29, 2024
|
Flaky tests in transformers repo
|
|
1
|
19
|
October 29, 2024
|
Impossible to train a model using both bf16 mixed precision training and torch compile, RuntimeError: expected mat1 and mat2 to have the same dtype
|
|
8
|
1746
|
October 28, 2024
|
Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct
|
|
4
|
1627
|
October 28, 2024
|
TypeError: DPODataCollator.__init__() got an unexpected keyword argument 'max_prompt_length'
|
|
0
|
67
|
October 28, 2024
|
Fine-tuning Segment Anything Model: Call up a saved model
|
|
4
|
2258
|
October 28, 2024
|
TFT5ForConditionalGeneration generate returns empty output_scores
|
|
1
|
382
|
October 28, 2024
|
Unstable PPO training: Highly negative KL divergence and highly positive average ratio of batch on LLMs
|
|
0
|
293
|
October 27, 2024
|
Extra GPU usage on custom Qwen2-VL
|
|
0
|
146
|
October 28, 2024
|
Backend low level kernel libraries used in Transformers
|
|
3
|
45
|
October 27, 2024
|
TypeError: '<' not supported between instances of 'NoneType' and 'int' while training wav2vec2
|
|
1
|
2508
|
October 27, 2024
|
Llama3.2 what is the difference between these 2 loading statements
|
|
3
|
52
|
October 26, 2024
|
Meta/llama3.2 download time
|
|
0
|
30
|
October 26, 2024
|
Repeat Yourself - ð€ Transformers Design Philosophy
|
|
12
|
2807
|
October 10, 2024
|
Classifier Dropout for *DecoderModel*ForSequenceClassification Classes
|
|
0
|
58
|
October 25, 2024
|
Different metrics score between when training and when merge lora adapter testing
|
|
1
|
114
|
October 25, 2024
|
What the tokens are cross attentions output for?
|
|
1
|
269
|
October 25, 2024
|
Problem with returning decoder cross attentions through generate function
|
|
0
|
25
|
October 25, 2024
|
No benefit from turning on gradient_checkpointing: True
|
|
1
|
160
|
October 24, 2024
|
Load frozen layers from one checkpoint and new layers from second checkpoint?
|
|
0
|
41
|
October 23, 2024
|
Image analysis and comparison of objects with the database
|
|
2
|
100
|
October 22, 2024
|
Multi-gpu huggingface training using trl
|
|
0
|
404
|
October 22, 2024
|
Storing and loading KV cache
|
|
6
|
1396
|
October 21, 2024
|
Is There a Way to Improve Memory Usage When Using Identical `past_key_values` for All Samples in a Batch?
|
|
3
|
386
|
October 21, 2024
|
New data on same task - fine-tuning or adapter?
|
|
0
|
49
|
October 21, 2024
|