AttributeError: module 'fsspec' has no attribute 'asyn'

i am getting this error, i am using Bart model, i have already fine tuned this model using trainer Seq2SeqTrainer, and output dir path i have given for my google drive, now i am trying to resume from last checkpoint using ‘resume_from_checkpoint’ argument but i am getting this error. Here is my code and the dataset i have used is IterableDataset.

tokenized_datasets = tokenized_datasets.with_format(“torch”)
training_args = Seq2SeqTrainingArguments(
output_dir="/content/gdrive/My Drive/Colab Notebooks/Code/models",
evaluation_strategy=“epoch”,
learning_rate=3e-5,
per_device_train_batch_size=4,
per_device_eval_batch_size=2,
weight_decay=0.01,
save_total_limit=1,
num_train_epochs=5,
predict_with_generate=True,
fp16=True,
save_strategy=“epoch”,
metric_for_best_model=“eval_rouge1”,
greater_is_better=True,
seed=41,
generation_max_length=max_target_length,max_steps=10000,load_best_model_at_end=True,
resume_from_checkpoint="/content/gdrive/My Drive/Colab Notebooks/Code/models"
)

trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets[“train”],
eval_dataset=tokenized_datasets[“validation”],
tokenizer=tokenizer,
data_collator=data_collator,compute_metrics=compute_metrics,
callbacks = [EarlyStoppingCallback(early_stopping_patience = 3,early_stopping_threshold=0.0)]
)

trainer.train()

1 Like

Just chiming in that I am seeing the same issue with Datasets

I solve this problems by adding ‘asyn’ to the “init.py” in ‘fsspec’ library like this,

from . import asyn

__all__ = [
'asyn',
...
]

It’s because the latest version of ‘fssec’ didn’t allow direct access like “fsspec.asyn”

2 Likes

the alternative way is to use the “interleave_datasets” from “datasets” library like “inerleave_dataset([train_dataset])” (interleave single dataset is the same with original dataset but it works well differently.)

Hi,

I am also receiving this same error when streaming datasets.

Best,

Enrico

1 Like

You saved my life. Many many thanks for this god-like solution.

In my case, I install a specific version of the datasets library.
pip install datasets==2.11.0