I have recently skimmed the ToMe paper recently released via Meta Research. I don’t personally have experience with visual transformers, and am wondering if ToMe has implications/use in text based transformers, or if the underlying intuition behind the token merging is only applicable for the spatial/audio/video modalities. I was looking through the github repo earlier, and definitely have more papers to read to understand ToMe in its entirety, but was wondering if anyone here could give me a quick answer on this.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Tutorials on transformers | 6 | 1366 | May 21, 2021 | |
Write with transformer Arxiv model | 0 | 248 | November 10, 2021 | |
Is it possible to use Decision Transformers on text? | 0 | 231 | December 22, 2022 | |
Pre-trained Model for Text Translation | 0 | 455 | June 30, 2022 | |
Causal text analysis using transformers | 1 | 520 | April 30, 2024 |