Hello!
I have a data set (151008 sentences) and only 2 classes (labels).
I wrote a sentence classifier using AutoModelForSequenceClassification and Huggingface Course and I have the following results:
cointegrated/rubert-tiny2 - F1=0.9708
DeepPavlov/rubert-base-cased - F1=0.967
DeepPavlov/rubert-base-cased-conversational - F1=0.9283
I expected to get such results.
BUT! When I use other models (with dataset = 151008 sentences), I get the following results:
sberbank-ai/sbert_large_nlu_ru - F1=0.0
bert-base-multilingual-cased - F1=0.0
However, if I use ÂĽ of the dataset (37752 sentences), I get adequate results. I used both the implementation through the Trainer and through the train loop.
Please tell me what I’m doing wrong and how to train the model on a full dataset?
I perform training in the cloud (yandex cloud), JupiterLab environment, 1x V100.
Code:
path = '/home/jupyter/work/resources/Datasets/dataset_raw'
raw_datasets = DatasetDict.load_from_disk(path)
#Tokenize
checkpoint = "bert-base-multilingual-cased"#"DeepPavlov/rubert-base-cased"#'sberbank-ai/sbert_large_nlu_ru'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example["sentence"], truncation=True, max_length=128)
tokenized_datasets_raw = raw_datasets.map(tokenize_function, batched=True)
#Prepare for training
tokenized_datasets = tokenized_datasets_raw.remove_columns(["sentence", "idx","level_0"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch")
train_dataset = tokenized_datasets_raw ['train']
eval_dataset = tokenized_datasets_raw ['test']
#CREATE TRAINER
from datasets import load_metric
from transformers import TrainingArguments, Trainer
device = torch.device("cuda")
metric = load_metric("f1")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2).to(device)
training_args = TrainingArguments(output_dir="/home/jupyter/work/resources/Trash", evaluation_strategy="epoch")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
x = trainer.predict(test_dataset=eval_dataset)