Transformers have millions of parameters, whereas when I use linear regression in scikit learn it learns one parameter, where would I use machine learning over transformers, if the number of parameters in machine learning is so less compared to transformer?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Transformers methode | 0 | 257 | May 31, 2022 | |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | 1 | 1615 | January 20, 2021 | |
Finetuning on MLM task | 0 | 669 | June 29, 2021 | |
About the Transformers category | 1 | 249 | July 7, 2020 | |
Some unintended things happen in Seq2SeqTrainer example | 3 | 1590 | November 30, 2020 |