|
Num_return_sequences > num_beams
|
|
3
|
9
|
November 13, 2025
|
|
Debugging inf/NaN Loss in Multi-Process Optuna/PyTorch Lightning HPO in Colab
|
|
2
|
6
|
November 13, 2025
|
|
Why does using `TextIteratorStreamer` result in so many empty outputs?
|
|
6
|
25
|
November 11, 2025
|
|
Creating language model only Lora Config
|
|
3
|
25
|
November 10, 2025
|
|
IndexError: index -1 is out of bounds for dimension 0 with size 0
|
|
3
|
25
|
November 7, 2025
|
|
How to use Qwen3-VL generate() with num_return_sequences > 1?
|
|
3
|
24
|
November 6, 2025
|
|
How can I get a list of word segmentation results for non-English string?
|
|
14
|
38
|
November 6, 2025
|
|
PEFT with SFTTrainer unexpected 'resume_from_checkpoint'
|
|
2
|
22
|
November 6, 2025
|
|
Model fine-tuning not respecting <|endoftext|> stop tokens during training
|
|
1
|
17
|
November 4, 2025
|
|
Additional_chat_templates does not exist on "main"
|
|
5
|
173
|
November 3, 2025
|
|
[Research/Discussion] Depth-agnostic stability for residual models (no extra norms, no tuning). Is this useful to you?
|
|
0
|
7
|
November 3, 2025
|
|
Xcode Can't Find swift-transformers Package
|
|
1
|
17
|
November 2, 2025
|
|
AutoTokenizer 404 error issue
|
|
3
|
126
|
November 2, 2025
|
|
Doing inference with FSDP during training affects checkpointing
|
|
3
|
609
|
November 1, 2025
|
|
Trainer being very slow to init training setting group_by_length to True
|
|
4
|
367
|
October 29, 2025
|
|
Unable to Run Sentence Transformer Text embedding in Docker
|
|
2
|
604
|
October 29, 2025
|
|
Training with Trainer really slow
|
|
1
|
1695
|
October 27, 2025
|
|
ValueError when using PatchTSTForClassification
|
|
6
|
175
|
October 27, 2025
|
|
Retrieving avg_logprob and other metrics for segments using whisper
|
|
1
|
23
|
October 23, 2025
|
|
Model loading gets stuck when calling "from_pretrained"
|
|
10
|
1261
|
October 23, 2025
|
|
Contrastive search output type issue
|
|
0
|
10
|
October 23, 2025
|
|
Getting -100 in predictions from T5 during compute_metrics
|
|
2
|
25
|
October 22, 2025
|
|
Model not using all attention layers while inferencing on device_map="auto"
|
|
2
|
24
|
October 21, 2025
|
|
WARN Status Code: 500
|
|
13
|
395
|
October 20, 2025
|
|
500 Internal Server Error when downloading model files (works for metadata, fails on large files)
|
|
1
|
123
|
October 20, 2025
|
|
PatchTSMixerForPrediction error with prediction of len 1
|
|
3
|
142
|
October 19, 2025
|
|
How to avert 'loading checkpoint shards'?
|
|
5
|
13668
|
October 17, 2025
|
|
How can I train a Polish-English translation Transformer model from scratch using PyTorch or Hugging Face?
|
|
0
|
16
|
October 16, 2025
|
|
SpiralTorch — a Rust-first ML stack that trains in Z-space (WebGPU/WASM/MPS/CUDA). It ships a tokenizer-free pre-embedding path and a Canvas Transformer projector that can feed HF Transformers via `inputs_embeds`
|
|
1
|
21
|
October 15, 2025
|
|
Quesiton about bf16 in Transformers
|
|
2
|
42
|
October 13, 2025
|