Hey!
The tokenizer options in HuggingFace are extremely useful and easy to work with.
I’d like to build a custom pipeline that:
- inputs custom data ,not something in Datasets (I know how to do this in HF).
- tokenizes it with a HF tokenizer pipeline (this I know how to do, too).
- uses the above to feed into a custom (no pretraining, not a specific architecture) PyTorch model,
- optimizes with any viable PyTorch loss function.
So basically, I want to work within the HF framework for the text’s preprocessing and tokenizing, while still doing whatever I wish in PyTorch afterwards…
Is this feasible?
Thanks!