Hello, I’ve started finetuning with BERT-cased and dataset I have used datasets from huggingface only.
They follow almost same structure by splitting text from the space.
Such, I am Raj Kumar
Result will be ['I','am','Raj','Kumar']
Will I get result like this only with tokens or I can use dataset format for token such as ["I", "am", "Raj Kumar"]
without splitting text.
Basically what I will retrieve is human name, product names and other data which will be in >= 2 words.