May I know what’s causing the issue .. even when i tried tensorlfow code on colab , the same issue persist
libraries version issue?
I am trying to load -
from datasets import load_dataset
raw_datasets = load_dataset(“glue”, “mrpc”)
raw_datasets
but I get this error message : /usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret HF_TOKEN
does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (Hugging Face – The AI community building the future.), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Downloading readme:
35.3k/? [00:00<00:00, 445kB/s]
ValueError Traceback (most recent call last)
/tmp/ipython-input-1-3226990657.py in <cell line: 0>()
1 from datasets import load_dataset
2
----> 3 raw_datasets = load_dataset(“glue”, “mrpc”)
4 raw_datasets
11 frames
/usr/local/lib/python3.11/dist-packages/fsspec/utils.py in glob_translate(pat)
729 continue
730 elif “" in part:
→ 731 raise ValueError(
732 "Invalid pattern: '’ can only be an entire path component”
733 )
ValueError: Invalid pattern: ‘**’ can only be an entire path component
What worked for me was to run
!pip install --upgrade datasets
raw_datasets = load_dataset(“glue”, “mrpc”)
I am getting this error :
/usr/local/lib/python3.11/dist-packages/fsspec/utils.py in glob_translate(pat) 729 continue 730 elif “" in part: → 731 raise ValueError( 732 "Invalid pattern: '’ can only be an entire path component” 733 )
ValueError: Invalid pattern: ‘**’ can only be an entire path component
In google collab, I tried reinstalling the libraries, clearing the .cache/huggingface/datasets/glue
still this error persists
Oh… So we need !pip install -U datasets huggingface_hub[hf_xet] fsspec
…
The issue was caused by an incompatibility between the versions of
datasets
,huggingface-hub
andfsspec
.
what is the problem about this code? it gives me an error
from datasets import load_dataset
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
TrainingArguments,
Trainer,
)
import numpy as np
import evaluate
# 1) Data & tokenizer
raw_datasets = load_dataset(“glue”, “mrpc”)
checkpoint = “bert-base-uncased”
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example\["sentence1"\], example\["sentence2"\], truncation=True)
tokenized_datasets = raw_datasets.map(
tokenize_function,
batched=True,
)
def compute_metrics(eval_preds):
logits, labels = eval_preds
preds = np.argmax(logits, axis=-1)
metric = evaluate.load("glue", "mrpc")
return metric.compute(predictions=preds, references=labels)
training_args = TrainingArguments(“test_trainer”)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets\["train"\],
eval_dataset=tokenized_datasets\["validation"\],
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
trainer.train()
How about this…? (Trainer)
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
import numpy as np
import evaluate
# 1) Data & tokenizer
raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example["sentence1"], example["sentence2"], truncation=True)
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
metric = evaluate.load("glue", "mrpc") # metric loaded once outside for efficiency
def compute_metrics(eval_preds):
preds = np.argmax(eval_preds.predictions, axis=-1)
labels = eval_preds.label_ids
return metric.compute(predictions=preds, references=labels)
training_args = TrainingArguments(output_dir="test_trainer")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
trainer.train()
Hi,
I wasn’t able to find anyone mentioning it but the chart in “Understanding Learning Curves” > “Accuracy Curves” seems wrong. Shouldn’t this be upside down with 'accuracy’ as the Y axis?
Thanks,
Masato