I want to incorporate 4 fold cross validation, however how do I ensure the train and validation are stratified?
Codes:
from datasets import load_dataset
The first 75% of dataset
train_75_25pct_ds = load_dataset(‘dataset’, split=‘train[:75%]’)
train_75_25pct_ds
The last 25% of dataset
validation_75_25pct_ds = load_dataset(‘dataset’, split=‘train[-25%:]’)
validation_75_25pct_ds