Managing Memory for Agents 2.0
|
|
0
|
47
|
October 26, 2024
|
Prompting llama3.2 to answer questions
|
|
2
|
200
|
October 30, 2024
|
How to convert sentence-transformers/msmarco-distilbert-base-tas-b model to torchscript
|
|
0
|
45
|
October 30, 2024
|
How to create a config.json after saving a model
|
|
21
|
40809
|
October 30, 2024
|
meta-llama/Llama-3.2-11B-Vision-Instruct did not reply
|
|
10
|
12928
|
October 29, 2024
|
Flaky tests in transformers repo
|
|
1
|
19
|
October 29, 2024
|
Impossible to train a model using both bf16 mixed precision training and torch compile, RuntimeError: expected mat1 and mat2 to have the same dtype
|
|
8
|
2212
|
October 28, 2024
|
Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct
|
|
4
|
1819
|
October 28, 2024
|
TypeError: DPODataCollator.__init__() got an unexpected keyword argument 'max_prompt_length'
|
|
0
|
72
|
October 28, 2024
|
Fine-tuning Segment Anything Model: Call up a saved model
|
|
4
|
2290
|
October 28, 2024
|
TFT5ForConditionalGeneration generate returns empty output_scores
|
|
1
|
402
|
October 28, 2024
|
Unstable PPO training: Highly negative KL divergence and highly positive average ratio of batch on LLMs
|
|
0
|
358
|
October 27, 2024
|
Extra GPU usage on custom Qwen2-VL
|
|
0
|
155
|
October 28, 2024
|
Backend low level kernel libraries used in Transformers
|
|
3
|
46
|
October 27, 2024
|
TypeError: '<' not supported between instances of 'NoneType' and 'int' while training wav2vec2
|
|
1
|
2541
|
October 27, 2024
|
Llama3.2 what is the difference between these 2 loading statements
|
|
3
|
53
|
October 26, 2024
|
Meta/llama3.2 download time
|
|
0
|
33
|
October 26, 2024
|
Repeat Yourself - ð€ Transformers Design Philosophy
|
|
12
|
2888
|
October 10, 2024
|
Classifier Dropout for *DecoderModel*ForSequenceClassification Classes
|
|
0
|
60
|
October 25, 2024
|
Different metrics score between when training and when merge lora adapter testing
|
|
1
|
133
|
October 25, 2024
|
What the tokens are cross attentions output for?
|
|
1
|
271
|
October 25, 2024
|
Problem with returning decoder cross attentions through generate function
|
|
0
|
27
|
October 25, 2024
|
No benefit from turning on gradient_checkpointing: True
|
|
1
|
183
|
October 24, 2024
|
Load frozen layers from one checkpoint and new layers from second checkpoint?
|
|
0
|
41
|
October 23, 2024
|
Image analysis and comparison of objects with the database
|
|
2
|
114
|
October 22, 2024
|
Storing and loading KV cache
|
|
6
|
1549
|
October 21, 2024
|
Is There a Way to Improve Memory Usage When Using Identical `past_key_values` for All Samples in a Batch?
|
|
3
|
390
|
October 21, 2024
|
New data on same task - fine-tuning or adapter?
|
|
0
|
68
|
October 21, 2024
|
Calculating loss twice but return two different values
|
|
1
|
20
|
October 21, 2024
|
Sequential Prefilling w/ Mamba
|
|
0
|
53
|
October 21, 2024
|