Llama 2 & 8K Training
|
|
0
|
734
|
August 4, 2023
|
Is llama2 supported by the Hugging Face Text Generation Inference (TGI) Deep Learning Container on Amazon SageMaker?
|
|
0
|
537
|
August 3, 2023
|
Probabilistic One Hot Encoding
|
|
0
|
297
|
August 3, 2023
|
How can i training a MLM without labels?
|
|
0
|
256
|
August 3, 2023
|
Which version should I fine-tune?
|
|
0
|
375
|
August 2, 2023
|
Audio Spectrogram Transformer in tensorflow
|
|
0
|
121
|
August 2, 2023
|
meta-llama/Llama-2-70b-hf filling up my disk
|
|
0
|
352
|
August 2, 2023
|
Created exe file not getting executed
|
|
0
|
561
|
August 2, 2023
|
In Donut Where the output of swin diffused with the text->1.At the starting of Bart encoder,2. cross attention(K,V from swin,Q from attention) of second attention of Bart encoder,3.directly the decoder part of BART
|
|
0
|
171
|
August 2, 2023
|
How can I load an LLM in 4-bits
|
|
0
|
486
|
August 2, 2023
|
Error with gpt2 training
|
|
0
|
362
|
August 1, 2023
|
Speech to Speech Generative AI system
|
|
0
|
206
|
August 1, 2023
|
Training Roberta for RAG
|
|
0
|
577
|
August 1, 2023
|
Diff between GPTQ and NF4 with bitsandbytes
|
|
0
|
1253
|
August 1, 2023
|
GPT-NeoX inference OOM with plenty of available memory
|
|
2
|
896
|
August 1, 2023
|
Falcon for translation
|
|
0
|
257
|
August 1, 2023
|
Fine Tune text generation Model using different type of data
|
|
0
|
355
|
August 1, 2023
|
How to implement custom vision encoder-decoder?
|
|
1
|
706
|
August 1, 2023
|
Issues with fine tuning an Encoder Decoder Model
|
|
0
|
813
|
July 31, 2023
|
NCCL timeout + corrupts checkpoint/latest
|
|
1
|
2610
|
July 31, 2023
|
Soft prompt learning for BERT and GPT using Transformers
|
|
4
|
3823
|
July 31, 2023
|
Which summarization model of huggingface supports more than 1024 tokens? Which model is more suitable for programming related articles?
|
|
1
|
1774
|
July 31, 2023
|
PubMedQA, Preprocessing
|
|
0
|
198
|
July 30, 2023
|
RuntimeError: tensors must be contiguous when finetuning GPT-J-6B using PEFT Lora
|
|
0
|
881
|
July 29, 2023
|
Class weights for Segformer loss function
|
|
1
|
932
|
July 28, 2023
|
Reproduce RoBERTa Using Huggingface Transformers
|
|
0
|
241
|
July 28, 2023
|
Training a model on a CSV
|
|
1
|
1005
|
July 28, 2023
|
Deepspeed inference and infinity offload with bitsandbytes 4bit loaded models
|
|
2
|
3860
|
July 27, 2023
|
Can not understand the sequence length and hidden size of the BEiT model
|
|
0
|
227
|
July 27, 2023
|
AttributeError: module 'fsspec' has no attribute 'asyn'
|
|
6
|
3671
|
July 27, 2023
|