Llama3 so much slow compared to ollama
|
|
13
|
5532
|
November 1, 2024
|
/home/user/app/llama32-omran.ipynb disappeared
|
|
2
|
30
|
November 1, 2024
|
RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CU
|
|
2
|
91
|
November 1, 2024
|
How to avert 'loading checkpoint shards'?
|
|
4
|
9220
|
November 1, 2024
|
Using persistent storage on HF spaces
|
|
1
|
26
|
November 1, 2024
|
How to cache common instruction prompt
|
|
16
|
793
|
October 31, 2024
|
Doing inference with FSDP during training affects checkpointing
|
|
1
|
102
|
October 31, 2024
|
Managing Memory for Agents 2.0
|
|
0
|
21
|
October 26, 2024
|
Saving checkpoints *only* on improvement
|
|
0
|
23
|
October 30, 2024
|
Prompting llama3.2 to answer questions
|
|
2
|
92
|
October 30, 2024
|
ãkv cache mergeã I want to know if the result of calculating their respective k v cache and concatenating them together is correct
|
|
4
|
16
|
October 30, 2024
|
How to convert sentence-transformers/msmarco-distilbert-base-tas-b model to torchscript
|
|
0
|
7
|
October 30, 2024
|
How to create a config.json after saving a model
|
|
21
|
38274
|
October 30, 2024
|
meta-llama/Llama-3.2-11B-Vision-Instruct did not reply
|
|
10
|
12071
|
October 29, 2024
|
Flaky tests in transformers repo
|
|
1
|
18
|
October 29, 2024
|
Impossible to train a model using both bf16 mixed precision training and torch compile, RuntimeError: expected mat1 and mat2 to have the same dtype
|
|
8
|
197
|
October 28, 2024
|
Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct
|
|
4
|
126
|
October 28, 2024
|
TypeError: DPODataCollator.__init__() got an unexpected keyword argument 'max_prompt_length'
|
|
0
|
31
|
October 28, 2024
|
Fine-tuning Segment Anything Model: Call up a saved model
|
|
4
|
2029
|
October 28, 2024
|
TFT5ForConditionalGeneration generate returns empty output_scores
|
|
1
|
293
|
October 28, 2024
|
Unstable PPO training: Highly negative KL divergence and highly positive average ratio of batch on LLMs
|
|
0
|
46
|
October 27, 2024
|
Extra GPU usage on custom Qwen2-VL
|
|
0
|
46
|
October 28, 2024
|
Backend low level kernel libraries used in Transformers
|
|
3
|
38
|
October 27, 2024
|
TypeError: '<' not supported between instances of 'NoneType' and 'int' while training wav2vec2
|
|
1
|
2238
|
October 27, 2024
|
Llama3.2 what is the difference between these 2 loading statements
|
|
3
|
31
|
October 26, 2024
|
Meta/llama3.2 download time
|
|
0
|
19
|
October 26, 2024
|
KV caching for varying length texts
|
|
0
|
119
|
October 26, 2024
|
Repeat Yourself - ð€ Transformers Design Philosophy
|
|
13
|
2597
|
October 26, 2024
|
Classifier Dropout for *DecoderModel*ForSequenceClassification Classes
|
|
0
|
29
|
October 25, 2024
|
Different metrics score between when training and when merge lora adapter testing
|
|
1
|
21
|
October 25, 2024
|