Hi folks,
I am trying to run the preprocessing code that was provided in google collab, and i got below error, while I replaced the line [from transformers import AdamW, AutoTokenizer, AutoModelForSequenceClassification] with the line [from torch.optim import AdamW] , the below error got resolved, yet I wanted to know the context of it.
Error : ImportError: cannot import name ‘AdamW’ from ‘transformers’ (/usr/local/lib/python3.11/dist-packages/transformers/init .py)
Also, I got another error in the next cell for which I couldn’t find solution to
Error: ValueError: Invalid pattern: ' ’ can only be an entire path component**
Kindly help me with this, thank you.
1 Like
Here is the background information regarding the removal of AdamW from the Transoformers library.
opened 05:51PM - 27 Mar 25 UTC
`transformers.AdamW` has been deprecated with a warning for some time and was re… moved in the last version of the `transformers` package. It hasn't been necessary since an AdamW optimizer was added to torch. Please update the code to to use the native torch version of AdamW.
You'll get an error like the following until that's changed:
```
Traceback (most recent call last):
File "<frozen runpy>", line 189, in _run_module_as_main
File "<frozen runpy>", line 112, in _get_module_details
File "C:\Users\test\anaconda3\envs\venv\Lib\site-packages\colbert\__init__.py", line 1, in <module>
from .trainer import Trainer
File "C:\Users\calebs\anaconda3\envs\aider\Lib\site-packages\colbert\trainer.py", line 5, in <module>
from colbert.training.training import train
File "C:\Users\test\anaconda3\envs\aider\Lib\site-packages\colbert\training\training.py", line 7, in <module>
from transformers import AdamW, get_linear_schedule_with_warmup
ImportError: cannot import name 'AdamW' from 'transformers' (C:\Users\test\anaconda3\envs\venv\Lib\site-packages\transformers\__init__.py)
```
Until then, users can `pip install use transformers==4.49.0` to continue using the colbert-ai package
opened 08:32AM - 24 Mar 20 UTC
closed 05:52AM - 31 May 20 UTC
wontfix
# ❓ Question
I just noticed that the implementation of AdamW in HuggingFace i… s different from PyTorch. The previous AdamW first updates the gradient then apply the weight decay. However, in the paper (Decoupled Weight Decay Regularization, link: https://arxiv.org/abs/1711.05101) and the implementation of Pytorch, the AdamW first apply the weight decay then update the gradient.
I was wondering if the two approaches are the same. Thanks! (In my opinion, they are not the same procedure.)
HuggingFace:
```python
for group in self.param_groups:
for p in group["params"]:
...
# Decay the first and second moment running average coefficient
# In-place operations to update the averages at the same time
exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
exp_avg_sq.mul_(beta2).addcmul_(1.0 - beta2, grad, grad)
denom = exp_avg_sq.sqrt().add_(group["eps"])
step_size = group["lr"]
if group["correct_bias"]: # No bias correction for Bert
bias_correction1 = 1.0 - beta1 ** state["step"]
bias_correction2 = 1.0 - beta2 ** state["step"]
step_size = step_size * math.sqrt(bias_correction2) / bias_correction1
p.data.addcdiv_(-step_size, exp_avg, denom)
# Just adding the square of the weights to the loss function is *not*
# the correct way of using L2 regularization/weight decay with Adam,
# since that will interact with the m and v parameters in strange ways.
#
# Instead we want to decay the weights in a manner that doesn't interact
# with the m/v parameters. This is equivalent to adding the square
# of the weights to the loss with plain (non-momentum) SGD.
# Add weight decay at the end (fixed version)
if group["weight_decay"] > 0.0:
p.data.add_(-group["lr"] * group["weight_decay"], p.data)
```
Pytorch:
```python
for group in self.param_groups:
for p in group['params']:
...
# Perform stepweight decay
p.data.mul_(1 - group['lr'] * group['weight_decay'])
exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
if amsgrad:
max_exp_avg_sq = state['max_exp_avg_sq']
beta1, beta2 = group['betas']
state['step'] += 1
bias_correction1 = 1 - beta1 ** state['step']
bias_correction2 = 1 - beta2 ** state['step']
# Decay the first and second moment running average coefficient
exp_avg.mul_(beta1).add_(1 - beta1, grad)
exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
if amsgrad:
# Maintains the maximum of all 2nd moment running avg. till now
torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
# Use the max. for normalizing running avg. of gradient
denom = (max_exp_avg_sq.sqrt() / math.sqrt(bias_correction2)).add_(group['eps'])
else:
denom = (exp_avg_sq.sqrt() / math.sqrt(bias_correction2)).add_(group['eps'])
step_size = group['lr'] / bias_correction1
p.data.addcdiv_(-step_size, exp_avg, denom)
```
In first cell,
!pip install -U datasets evaluate transformers==4.49.0 sentencepiece huggingface_hub fsspec
This change made it work. An older version of the datasets
library was being used.
opened 01:46PM - 27 May 25 UTC
closed 01:26AM - 30 May 25 UTC
### Describe the bug
I have a dataset on HF [here](https://huggingface.co/datas… ets/kambale/luganda-english-parallel-corpus) that i've previously used to train a translation model [here](https://huggingface.co/kambale/pearl-11m-translate).
now i changed a few hyperparameters to increase number of tokens for the model, increase Transformer layers, and all
however, when i try to load the dataset, this error keeps coming up.. i have tried everything.. i have re-written the code a hundred times, and this keep coming up
### Steps to reproduce the bug
Imports:
```bash
!pip install datasets huggingface_hub fsspec
```
Python code:
```python
from datasets import load_dataset
HF_DATASET_NAME = "kambale/luganda-english-parallel-corpus"
# Load the dataset
try:
if not HF_DATASET_NAME or HF_DATASET_NAME == "YOUR_HF_DATASET_NAME":
raise ValueError(
"Please provide a valid Hugging Face dataset name."
)
dataset = load_dataset(HF_DATASET_NAME)
# Omitted code as the error happens on the line above
except ValueError as ve:
print(f"Configuration Error: {ve}")
raise
except Exception as e:
print(f"An error occurred while loading the dataset '{HF_DATASET_NAME}': {e}")
raise e
```
now, i have tried going through this [issue](https://github.com/huggingface/datasets/issues/6737) and nothing helps
### Expected behavior
loading the dataset successfully and perform splits (train, test, validation)
### Environment info
from the imports, i do not install specific versions of these libraries, so the latest or available version is installed
* `datasets` version: latest
* `Platform`: Google Colab
* `Hardware`: NVIDIA A100 GPU
* `Python` version: latest
* `huggingface_hub` version: latest
* `fsspec` version: latest
Thanks a lot John !! It helped.
1 Like