Hi Guys, I am trying to train a bert based classifier for a problem that contains 2 text Columns.
The data looks like this:
Text 1 | Text 2 | Label
It is a multi-class problem. I have tried following this notebook. But I am having a hard time on using 2 columns together. If someone can point me to the correct resource that would be really helpful.
Is your problem with the tokenizer? You can pass in two input columns in your custom tokenize function, like:
return tokenizer(examples["text1"], examples["text2"])
I’m not sure what aspect of the classification you are struggling with. I’m somewhat of a beginner myself, so more details and code examples would be helpful to help you better.
Any end to end solution would be helpful because I want to understand the entire process.
It would be better if you post the code you’ve tried so far and specify the specific problem that you’re having.
These forums are usually not for tutorial-type solutions but more for getting help with specific problems.
I tried your way it worked.
tokenizer = AutoTokenizer.from_pretrained(model_str)
return tokenizer(examples["ques_resp"], padding="max_length", truncation=True)
tokenized_datasets2 = dataset2.map(tokenize_function, batched=True)
tokenized_dataset_test = dataset_test.map(tokenize_function, batched=True)
small_train_dataset2 = tokenized_datasets2.shuffle(seed=42).select(range(0,3600))
small_eval_dataset2 = tokenized_datasets2.shuffle(seed=42).select(range(3600,3976))