Mask2Former: CUDA training

homo-luden · July 17, 2023, 11:32am

I am trying to train Mask2Former with CUDA enabled, but I am encountering the following error: “RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same.” However, it works fine when CUDA is disabled. Can someone please help me fix this issue? Thank you in advance!

config = Mask2FormerConfig(feature_size=CFG.img_size[0], mask_feature_size=CFG.img_size[0])
image_processor = Mask2FormerImageProcessor(config)
model = Mask2FormerForUniversalSegmentation(config).to(device)
criterion = nn.CrossEntropyLoss()

torch.cuda.empty_cache()
epochs = 1000
scaler = GradScaler()

valid_fold = 3

train_df = df[df['fold'] != valid_fold]
valid_df = df[df['fold'] == valid_fold]
    
train_dataset = HubmapDataset(train_df, config)
valid_dataset = HubmapDataset(valid_df, config)
    
train_dataloader = DataLoader(train_dataset, batch_size=CFG.batch_size, shuffle=True)
valid_dataloader = DataLoader(valid_dataset, batch_size=2 * CFG.batch_size, shuffle=False)

optimizer, scheduler = get_optimizer_and_scheduler(model, train_dataloader, "adamw")
for epoch in range(epochs):
    model.train()
    total_train_loss = 0
    total_test_loss = 0
    pbar = tqdm(train_dataloader, desc=f"Train: Epoch {epoch + 1}", total=len(train_dataloader), mininterval=5)
    for inputs in pbar:
        optimizer.zero_grad()
        imgs = inputs[0]
        imgs = image_processor(list(imgs), return_tensors="pt", size=(512,512))
        #imgs = imgs.to(device)
        ings = image_processor.encode_inputs(pixel_values_list=imgs['pixel_values'], 
                                             task_inputs=['instance'],
                                             segmentation_maps=inputs[1], 
                                             ignore_index=0,
                                             return_tensors='pt')
        ings = ings.convert_to_tensors()
        for k in ings.keys():
            try:
                ings[k] = ings[k].to(device)
            except AttributeError:
                ings[k] = torch.stack(ings[k]).to(device)
        outputs = model(**ings)
        loss = outputs.loss
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scale = scaler.get_scale()
        scaler.update()
        skip_lr_scheduler = scale > scaler.get_scale()
        if scheduler is not None and not skip_lr_scheduler:
            scheduler.step()

        lr = scheduler.get_last_lr()[0] if scheduler else CFG.one_cycle_max_lr
        loss = loss.item()
            
        pbar.set_postfix({"loss": loss, "lr": lr})
        total_train_loss += loss
    total_train_loss /= len(train_dataloader)
    
    model.eval()
    pbar = tqdm(valid_dataloader, desc=f"Validation: Epoch {epoch + 1}", total=len(valid_dataloader), mininterval=5)
    for inputs in pbar:
        imgs = inputs[0]
        imgs = image_processor(list(imgs), return_tensors="pt", size=(512,512))
        #imgs = imgs.to(device)
        ings = image_processor.encode_inputs(pixel_values_list=imgs['pixel_values'], 
                                             task_inputs=['instance'],
                                             segmentation_maps=inputs[1], 
                                             ignore_index=0,
                                             return_tensors='pt')
        ings = ings.convert_to_tensors()
        outputs = model(inputs)
        loss = outputs.item()
        pbar.set_postfix({"loss": loss})
        total_test_loss += loss
    total_test_loss /= len(valid_dataloader)     
    print(f'TOTAL TRAIN LOSS: {total_train_loss} | TOTAL VALID LOSS: {total_test_loss}')

nielsr · July 17, 2023, 2:28pm

Hi,

The error

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same.

suggests that your inputs are on the GPU, but your model is not. So I’d double check whether your model is effectively on the GPU.

homo-luden · July 19, 2023, 8:56pm

In the second row I move my model to the device, which is ‘cuda’, but looks like it doen’t move properly. So how can I check if the model is effectively on the GPU?

nielsr · July 22, 2023, 5:13pm

You can check by typing nvidia-smi in a terminal to see whether memory is being occupied.

chokevin8 · July 28, 2023, 12:17am

@homo-luden @nielsr I have the same exact problem. Did anyone one of you guys figure it out? I have the inputs in the GPU but it keeps saying model is in CPU when I clearly call model= model.to(device). My nvidia-smi shows GPU memory usage as well. This is perhaps a bug? I’ve tried downgrading to transformers to version 4.27.0 to no success as well. I am using python 3.10. I would appreciate some help, thank you!

nielsr · July 30, 2023, 12:36pm

Hi,

I’m not able to reproduce this issue. The model is correctly placed on the GPU as shown in this notebook: Google Colab.

Topic		Replies	Views
RuntimeError during inference on Mask2Former model Beginners	6	363	February 8, 2024
Mask2Former on multi-gpu cuda 🤗Transformers	0	164	November 27, 2023
Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN Beginners	0	784	March 5, 2023
HugginFace dataset error: RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor 🤗Datasets	3	11499	May 6, 2022
Error from CUDA on audio classification Beginners	3	1854	September 18, 2024

Mask2Former: CUDA training

Related topics