I’m trying to train an unconditional diffusion model on a greyscale image dataset. I am using diffusers_training_example.ipynb on Google Colab connected to my local GPU. When running the ‘Let’s train!’ cell I am getting this Accelerate error. Initially, I tried downgrading my Accelerate from 1.3.0 to 0.3.0 and 0.27.0 as some forums suggested but this made no difference. Any advice would be great! Thank you.
There is a possibility that it is simply a bug in Accelerate…
I see, so it seems the pull was resolved? what do I need to do to replicate such? I would of assumed i was using the latest Accelerate with the supposed fix
pip install git+https://github.com/huggingface/accelerate
?
But I think it’s also merged into the pip version. Maybe it’s a different error.![]()
Hello,
It seems like you’re encountering an issue where the logging_dir argument is causing a problem with the Accelerate.__init__() method during your training process. This error might be related to mismatched versions of libraries or changes in the API.
Here are a few steps you can try to resolve the issue:
-
Ensure Version Compatibility: Since you’ve already tried downgrading
Accelerate, ensure that all dependencies (such asdiffusersandtransformers) are compatible with the version ofAccelerateyou are using. Sometimes, even ifAccelerateis downgraded, the version ofdiffusersmay require a more recent version ofAccelerate.You can update
diffusersto the latest version with:pip install --upgrade diffusers -
Check for
logging_dirArgument in the Code: The error suggests that thelogging_dirargument is not expected byAccelerate.__init__(). This might be due to a change in the API or a version mismatch. Ensure that your code doesn’t pass this argument to theAccelerateinitialization method or check if it can be handled elsewhere.You can remove or modify the use of
logging_dirby checking where it’s being passed toAccelerateand whether it needs to be included. For instance:from accelerate import Accelerator accelerator = Accelerator() # Ensure no 'logging_dir' argument here -
Update Accelerate: Sometimes, errors like this can occur due to an outdated or incompatible version of the
Acceleratelibrary. Ensure you’re using the latest stable version ofAccelerate. To update, run:pip install --upgrade accelerate -
Check for Additional Arguments: If the
logging_dirargument is still needed for logging, make sure that you’re passing it correctly to the logging setup and not directly toAccelerate.__init__(). You might need to pass it to a different component of the training pipeline (e.g.,TensorBoard,wandb, or theTrainerclass). -
Restart Your Runtime: After updating or downgrading the libraries, be sure to restart your runtime in Google Colab to clear any residual errors and ensure that the updated versions are being used correctly.
-
Consider using
logging_dirwith theTrainer: If you’re using Hugging Face’sTraineror another high-level API for training, thelogging_dirargument might be better placed there, not directly in theAccelerateobject initialization.
If these steps don’t resolve the issue, you might want to explore further compatibility details between diffusers, accelerate, and other training components you’re using.
Hope this helps, and let me know if you need further assistance!
Hi, thanks for the extensive options! upgrading Accelerate or Diffusers did not solve the problem. Can you expand a little on what you mean in method 2? I’m not sure I fully understand how I can check where its being passed using the 2 lines of code you provided. Also, I am using a pre-written training script, i.e. any calls for accelerate.init() are being made within functions I have not edited. Thanks again for your help.
This is one part of my code. I think this is useful for you.
from transformers import AutoModelForSequenceClassification, AutoTokenizer, get_scheduler
from datasets import load_dataset
from torch.utils.data import DataLoader
from torch.optim import AdamW
import torch
from accelerate import Accelerator
# Initialize the Accelerator
accelerator = Accelerator()
# Load dataset and tokenizer
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Tokenize the dataset
def preprocess_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(preprocess_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(["text"]).with_format("torch")
train_dataset = tokenized_datasets["train"]
test_dataset = tokenized_datasets["test"]
# DataLoaders
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=8)
test_dataloader = DataLoader(test_dataset, batch_size=8)
# Model and optimizer
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
optimizer = AdamW(model.parameters(), lr=5e-5)
# Scheduler
num_training_steps = len(train_dataloader) * 3 # 3 epochs
lr_scheduler = get_scheduler("linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)
# Prepare everything for Accelerate
model, optimizer, train_dataloader, test_dataloader, lr_scheduler = accelerator.prepare(
model, optimizer, train_dataloader, test_dataloader, lr_scheduler
)
# Training loop
num_epochs = 3
for epoch in range(num_epochs):
model.train()
for batch in train_dataloader:
outputs = model(**batch)
loss = outputs.loss
accelerator.backward(loss) # Backpropagation with Accelerator
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
print(f"Epoch {epoch + 1} completed.")
# Save the model
accelerator.wait_for_everyone() # Synchronize across processes
unwrapped_model = accelerator.unwrap_model(model) # Get the original model
unwrapped_model.save_pretrained("my_model")
print("Training completed and model saved!")
