Custom, without any pretraining, training with PyTorch

Ofek · January 30, 2023, 1:45pm

Hey!
The tokenizer options in HuggingFace are extremely useful and easy to work with.
I’d like to build a custom pipeline that:

inputs custom data ,not something in Datasets (I know how to do this in HF).
tokenizes it with a HF tokenizer pipeline (this I know how to do, too).
uses the above to feed into a custom (no pretraining, not a specific architecture) PyTorch model,
optimizes with any viable PyTorch loss function.

So basically, I want to work within the HF framework for the text’s preprocessing and tokenizing, while still doing whatever I wish in PyTorch afterwards…

Is this feasible?

Thanks!

Topic		Replies	Views
Using HF to train a custom PyTorch architecture Beginners	0	521	July 29, 2022
Save tokenizer with argument 🤗Tokenizers	2	1974	October 26, 2022
How does one create a custom hugging face model with a already working tokenizer? 🤗Transformers	1	986	July 29, 2024
Custom Pipeline 🤗Transformers	0	561	July 18, 2022
Pipeline's Tokenizer vs training tokenizer Beginners	1	453	March 8, 2021

Custom, without any pretraining, training with PyTorch

Related topics