Swapping GPT-2 Attention with Flash Attention
|
|
3
|
1281
|
June 4, 2023
|
How to use the wav2vec2-large-TIMIT-IPA2 model?
|
|
0
|
23
|
June 4, 2023
|
TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_4bit'
|
|
2
|
384
|
June 3, 2023
|
PyTorchBenchmark pickle local object error
|
|
0
|
18
|
June 3, 2023
|
Progress bar for HF pipelines
|
|
7
|
3541
|
June 3, 2023
|
Bert-base-uncased performs badly in next sentence prediction (bookcorpus)
|
|
0
|
23
|
June 2, 2023
|
Forward() got an unexpected keyword argument 'attention_mask' in Whisper Tutorial
|
|
1
|
38
|
June 2, 2023
|
How to use Adaptive Learning rate during training?
|
|
4
|
801
|
June 2, 2023
|
Trainer gives error after 1st epoch and evaluation
|
|
4
|
2195
|
June 2, 2023
|
What is the data file format of `run_ner.py`?
|
|
0
|
27
|
June 2, 2023
|
Error on pipeline with docquery (Transformers)
|
|
1
|
81
|
June 2, 2023
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
|
|
6
|
2201
|
June 2, 2023
|
Unexpected results wth XLMR trasformer models
|
|
0
|
26
|
June 2, 2023
|
New pipeline for zero-shot text classification
|
|
104
|
55275
|
June 2, 2023
|
Logging_steps=1 => ValueError
|
|
0
|
28
|
June 2, 2023
|
Loading quantized model on CPU only
|
|
1
|
184
|
June 1, 2023
|
AssertionError: Torch not compiled with CUDA enabled
|
|
0
|
57
|
June 1, 2023
|
Stopping generation before max_new_tokens
|
|
0
|
28
|
June 1, 2023
|
T5 variants return Training Loss 0 and Validation loss nan while fine tuning
|
|
6
|
784
|
June 1, 2023
|
How does GPT decide to stop generating sentences without EOS token?
|
|
2
|
79
|
June 1, 2023
|
FP-16 training producing nans on t5-large/flan-t5-xl
|
|
0
|
22
|
June 1, 2023
|
Huggingface Data Collator: Index put requires the source and destination dtypes match, got Float for the destination and Long for the source
|
|
7
|
402
|
June 1, 2023
|
MLM Using AlBert - No loss error
|
|
0
|
35
|
June 1, 2023
|
Continuing model training takes seconds in next round
|
|
3
|
664
|
June 1, 2023
|
GPU error on LoRA for token classification
|
|
0
|
30
|
June 1, 2023
|
Fail predict using Falcon-7B-Instruct
|
|
0
|
115
|
June 1, 2023
|
Evaluate subset of data during training
|
|
3
|
1010
|
June 1, 2023
|
How to make the Trainer log custom quantities?
|
|
0
|
20
|
May 31, 2023
|
I am getting 0.0 loss value at the very first epoch of training bigscience/mt0-small seq2seq model
|
|
0
|
25
|
May 31, 2023
|
PEFT LoRA GPT-NeoX - Backward pass failing
|
|
3
|
545
|
May 31, 2023
|