Chapter 3 questions

ebythomas23 · June 8, 2025, 3:39am

May I know what’s causing the issue .. even when i tried tensorlfow code on colab , the same issue persist

John6666 · June 8, 2025, 5:50am

libraries version issue?

ZERYHSAE45YS4R5YH · July 10, 2025, 9:59am

I am trying to load -
from datasets import load_dataset

raw_datasets = load_dataset(“glue”, “mrpc”)
raw_datasets

but I get this error message : /usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (Hugging Face – The AI community building the future.), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Downloading readme:
35.3k/? [00:00<00:00, 445kB/s]

ValueError Traceback (most recent call last)
/tmp/ipython-input-1-3226990657.py in <cell line: 0>()
1 from datasets import load_dataset
2
----> 3 raw_datasets = load_dataset(“glue”, “mrpc”)
4 raw_datasets

11 frames
/usr/local/lib/python3.11/dist-packages/fsspec/utils.py in glob_translate(pat)
729 continue
730 elif “" in part:
→ 731 raise ValueError(
732 "Invalid pattern: '’ can only be an entire path component”
733 )

ValueError: Invalid pattern: ‘**’ can only be an entire path component

klandman · July 10, 2025, 8:17pm

What worked for me was to run
!pip install --upgrade datasets

rimoKR · July 14, 2025, 2:09pm

raw_datasets = load_dataset(“glue”, “mrpc”)
I am getting this error :

/usr/local/lib/python3.11/dist-packages/fsspec/utils.py in glob_translate(pat) 729 continue 730 elif “" in part: → 731 raise ValueError( 732 "Invalid pattern: '’ can only be an entire path component” 733 )

ValueError: Invalid pattern: ‘**’ can only be an entire path component

In google collab, I tried reinstalling the libraries, clearing the .cache/huggingface/datasets/glue
still this error persists

John6666 · July 15, 2025, 1:10am

Oh… So we need !pip install -U datasets huggingface_hub[hf_xet] fsspec …

github.com/huggingface/datasets

Invalid pattern: '**' can only be an entire path component

opened 07:28PM - 16 Mar 24 UTC

closed 11:32AM - 13 May 24 UTC

JPonsa

### Describe the bug ValueError: Invalid pattern: '**' can only be an entire pa…th component when loading any dataset ### Steps to reproduce the bug import datasets ds = datasets.load_dataset("TokenBender/code_instructions_122k_alpaca_style") ### Expected behavior loading the dataset successfully ### Environment info - `datasets` version: 2.18.0 - Platform: Windows-10-10.0.22631-SP0 - Python version: 3.11.7 - `huggingface_hub` version: 0.20.3 - PyArrow version: 15.0.0 - Pandas version: 2.2.1 - `fsspec` version: 2023.12.2

The issue was caused by an incompatibility between the versions of datasets, huggingface-hub and fsspec.

Amirdferff · August 17, 2025, 6:20pm

what is the problem about this code? it gives me an error

from datasets import load_dataset

from transformers import (

AutoTokenizer,

AutoModelForSequenceClassification,

TrainingArguments,

Trainer,

)

import numpy as np

import evaluate

# 1) Data & tokenizer

raw_datasets = load_dataset(“glue”, “mrpc”)

checkpoint = “bert-base-uncased”

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize_function(example):

return tokenizer(example\["sentence1"\], example\["sentence2"\], truncation=True)

tokenized_datasets = raw_datasets.map(

tokenize_function,

batched=True,

)

def compute_metrics(eval_preds):

logits, labels = eval_preds

preds = np.argmax(logits, axis=-1)

metric = evaluate.load("glue", "mrpc")

return metric.compute(predictions=preds, references=labels)

training_args = TrainingArguments(“test_trainer”)

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

trainer = Trainer(

model=model,

args=training_args,

train_dataset=tokenized_datasets\["train"\],

eval_dataset=tokenized_datasets\["validation"\],

tokenizer=tokenizer,

compute_metrics=compute_metrics,

)

trainer.train()

John6666 · August 18, 2025, 1:05am

How about this…? (Trainer)

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
import numpy as np
import evaluate

# 1) Data & tokenizer
raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize_function(example):
    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

metric = evaluate.load("glue", "mrpc") # metric loaded once outside for efficiency

def compute_metrics(eval_preds):
    preds = np.argmax(eval_preds.predictions, axis=-1)
    labels = eval_preds.label_ids
    return metric.compute(predictions=preds, references=labels)

training_args = TrainingArguments(output_dir="test_trainer")

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

trainer.train()

rillomas · August 28, 2025, 2:05pm

Hi,

I wasn’t able to find anyone mentioning it but the chart in “Understanding Learning Curves” > “Accuracy Curves” seems wrong. Shouldn’t this be upside down with 'accuracy’ as the Y axis?

Thanks,

Masato

John6666 · August 29, 2025, 3:30am

Yeah. Also several other issues appear to have been raised.

Mikhil812 · October 6, 2025, 5:52am

I am getting an error on the trainer.train() step.

Error                                     Traceback (most recent call last)
/tmp/ipython-input-4032920361.py in <cell line: 0>()
----> 1 trainer.train()

7 frames
/usr/local/lib/python3.12/dist-packages/wandb/sdk/lib/preinit.py in preinit_wrapper(*args, **kwargs)
     34 ) -> Callable:
     35     def preinit_wrapper(*args: Any, **kwargs: Any) -> Any:
---> 36         raise wandb.Error(f"You must call wandb.init() before {name}()")
     37 
     38     preinit_wrapper.__name__ = str(name)

Error: You must call wandb.init() before wandb.log()

What is wanb.init(), how do I fix it ?

John6666 · October 6, 2025, 7:08am

Try adding report_to="none" like args = TrainingArguments(..., report_to="none"). About Wandb
Or try Trackio.

Topic		Replies	Views
Chapter 7 questions Course	121	10496	October 22, 2025
Fine Tuning IMDb tutorial - Unable to reproduce and adapt Beginners	19	8609	August 21, 2020
Transformers v3.0.0 is out! 🤗Transformers	0	1955	July 7, 2020
Seq2SeqTrainer: enabled must be a bool (got NoneType) 🤗Transformers	15	3974	December 5, 2022
Error when finetuning pretrained huggingface conv-ai chatbot model 🤗Transformers	2	818	April 19, 2021

Chapter 3 questions

raw_datasets = load_dataset(“glue”, “mrpc”) I am getting this error :

Related topics

raw_datasets = load_dataset(“glue”, “mrpc”)
I am getting this error :