Trying to process longer documents with BERT-based models
|
|
0
|
619
|
March 8, 2021
|
A standard way to have the `generate` method of the `GenerateMixin` only output the generated tokens
|
|
0
|
618
|
November 19, 2023
|
Scaling Mistral-7B on AWS SageMaker With Multiple Replica Endpoints
|
|
0
|
617
|
January 19, 2024
|
Ranking model poor results, looking for improvement
|
|
0
|
617
|
June 25, 2022
|
BertForSequenceClassification - ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 35])))
|
|
0
|
615
|
July 11, 2023
|
Instruction Fine-Tuning StarCoder Model
|
|
0
|
615
|
June 28, 2023
|
Fused Kernel Operations
|
|
0
|
615
|
July 26, 2022
|
Equivalent for ignore token for Vision Transformers?
|
|
0
|
614
|
May 12, 2022
|
How to Reduce Latency When Using Tool Calling in LLamaAndroid?
|
|
2
|
63
|
February 25, 2025
|
Fine-tune a translation model on monolingual data
|
|
1
|
433
|
June 16, 2022
|
Train modell for Question Answering
|
|
3
|
306
|
May 6, 2024
|
Creating a login; Protecting Chat-Ui route
|
|
0
|
607
|
September 28, 2023
|
Slow inference while performing translation
|
|
0
|
604
|
June 10, 2022
|
FAISS similarity search error
|
|
0
|
603
|
April 20, 2024
|
Finetuning with SFTtrainer
|
|
1
|
426
|
June 12, 2024
|
Weight decay rate in create optimizer tensorflow
|
|
0
|
598
|
April 6, 2022
|
AI website chatbots
|
|
2
|
345
|
July 7, 2024
|
How to sync Hugging Face model commits with GitHub?
|
|
8
|
112
|
April 10, 2025
|
Access Hidden States in Custom Loss Function in Finetuning
|
|
0
|
106
|
November 18, 2024
|
TokenizerFast with various units (e.g., BPE, wordpiece, word, character, unigram)
|
|
1
|
421
|
November 12, 2020
|
Batch processing for stream dataset
|
|
0
|
592
|
August 12, 2022
|
Converting AlignTTS (text-to-speech) model to ONNX
|
|
0
|
589
|
April 18, 2023
|
Setting the no_answer probability in the squad_v2 metric
|
|
0
|
586
|
February 21, 2022
|
Max_step and generative dataset
|
|
0
|
585
|
November 5, 2021
|
Use VisionTextDualEncoder for image-text retrieval
|
|
0
|
580
|
December 13, 2022
|
How to set dropout range on classifier layer using hyperparameter search?
|
|
0
|
580
|
June 10, 2022
|
GPU utlization up and down
|
|
0
|
579
|
May 13, 2022
|
Why this site scam poeple?
|
|
1
|
406
|
January 2, 2023
|
How to parallelize inference on a quantized model
|
|
5
|
234
|
October 7, 2024
|
I have a question about giving Image condition at diffusion models
|
|
0
|
572
|
December 11, 2023
|
Why is there no Output when IDEFICS based model is run on CUDA?
|
|
0
|
571
|
December 26, 2023
|
Reformer - attention data format
|
|
1
|
399
|
June 29, 2023
|
Correct way to use pre-trained models
|
|
1
|
398
|
August 27, 2021
|
Question about dataset from TFRecord files
|
|
0
|
559
|
September 21, 2023
|
How to get vocabulary embedding matrix from an LLM?
|
|
1
|
394
|
December 1, 2023
|
Generation is always CPU limited
|
|
0
|
557
|
April 21, 2023
|
GPT2.generate() with custom inputs_embeds argument returning tensor (1*max_length) instead of (batch_size*max_length)
|
|
0
|
555
|
April 19, 2022
|
Model.generate use_cache=True generates different results than use_cache=False
|
|
3
|
155
|
March 4, 2025
|
Dataset download faster
|
|
1
|
389
|
March 29, 2024
|
DeepSpeed and RayTune
|
|
0
|
547
|
September 26, 2021
|
Error when using transform function of pixel_values
|
|
0
|
546
|
July 1, 2023
|
Trainer.train() will cause PretrainedConfig default construct
|
|
1
|
217
|
February 29, 2024
|
CUDA OOM on model(inputs) but not on model.generate(inputs), but doesn't generate use model(inputs)?
|
|
4
|
244
|
May 4, 2024
|
Data Parallelism for multi-GPUs Inference
|
|
0
|
545
|
October 26, 2022
|
Original Bert Pretraining
|
|
0
|
545
|
January 10, 2022
|
Adding a new mask_token for BERT-like models/tokenizers
|
|
0
|
542
|
May 26, 2023
|
ValueError: Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor
|
|
0
|
542
|
April 28, 2023
|
Why are some NLI models giving logits in opposite positions to expected labels?
|
|
0
|
542
|
March 11, 2022
|
T5: why do we have more tokens expressed via cross attentions than the decoded sequence?
|
|
1
|
383
|
February 21, 2023
|
Create Custom Loss function for transformers using a diffusion model and CLIP
|
|
0
|
541
|
February 19, 2024
|