How to determine dataset shape for specific needs

Dear @John6666 and other community members,

According to this hf documentation about dataset. I could “label” dataset as I shape them into Dataset class like by making nested folder or with json metadata. Is there any reading about which type that I actually need for image segmentation?

Like for example this paper about segformer uses big dataset called Cityscapes that, according to its documentation uses coarse and fine annotations that is, if I’m catching it right, a pretrained labelled dataset that has been trained with their preceding mask.

If I want to make new model by fine-tuning segformer-cityscape model with this dataset I made by myself by masking it manually (not with annotating-AI), which shape that works and how do I make specific training argument for that data? if that sentence could make any sense…

1 Like

Okay, I don’t know!
Or rather, the search for the optimal data set is a worthy research theme…
But there are quite a few good articles buried in HF articles, online papers, and github.
I’m sleepy today, so I’ll do a quick search while I work tomorrow…:sleepy:(23:38 here)

Okay tbh I might be went over too much stretch here wanting everything to be as perfect as it could for the first time, sorry bout that. Would appreciate that articles you found interesting, don’t worry, take your time and rest well thanks in advance :saluting_face:

1 Like

Due to a problem that has occurred with users who upload a lot of data, including myself, I will not be able to reply for a while. I thought I’d let you know.

Im back yesterday.

The problem still the same, it’s dimension problem I think but it doesn’t matter whether I transform the mask into BW or binary (if it’s a different thing, I thought it wasn’t but I don’t know whether Pillow BW and OpenCV Binary are the same thing).

Should I copy whole code here?

The error message is

{
	"name": "IndexError",
	"message": "Target 2 is out of bounds.",
	"stack": "---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[16], line 12
      2 from transformers import Trainer
      4 trainer = Trainer(
      5     model=model,
      6     args=training_args,
   (...)
      9     compute_metrics=compute_metrics,
     10 )
---> 12 trainer.train()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:2155, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2152 try:
   2153     # Disable progress bars when uploading models during checkpoints to avoid polluting stdout
   2154     hf_hub_utils.disable_progress_bars()
-> 2155     return inner_training_loop(
   2156         args=args,
   2157         resume_from_checkpoint=resume_from_checkpoint,
   2158         trial=trial,
   2159         ignore_keys_for_eval=ignore_keys_for_eval,
   2160     )
   2161 finally:
   2162     hf_hub_utils.enable_progress_bars()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:2522, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2516 context = (
   2517     functools.partial(self.accelerator.no_sync, model=model)
   2518     if i != len(batch_samples) - 1
   2519     else contextlib.nullcontext
   2520 )
   2521 with context():
-> 2522     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
   2524 if (
   2525     args.logging_nan_inf_filter
   2526     and not is_torch_xla_available()
   2527     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2528 ):
   2529     # if loss is nan or inf simply add the average of previous logged losses
   2530     tr_loss = tr_loss + tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:3655, in Trainer.training_step(self, model, inputs, num_items_in_batch)
   3653         loss = self.compute_loss(model, inputs)
   3654     else:
-> 3655         loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
   3657 del inputs
   3658 if (
   3659     self.args.torch_empty_cache_steps is not None
   3660     and self.state.global_step % self.args.torch_empty_cache_steps == 0
   3661 ):

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:3709, in Trainer.compute_loss(self, model, inputs, return_outputs, num_items_in_batch)
   3707         loss_kwargs[\"num_items_in_batch\"] = num_items_in_batch
   3708     inputs = {**inputs, **loss_kwargs}
-> 3709 outputs = model(**inputs)
   3710 # Save past state if it exists
   3711 # TODO: this needs to be fixed and made cleaner later.
   3712 if self.args.past_index >= 0:

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1736, in Module._wrapped_call_impl(self, *args, **kwargs)
   1734     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735 else:
-> 1736     return self._call_impl(*args, **kwargs)

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1747, in Module._call_impl(self, *args, **kwargs)
   1742 # If we don't have any hooks, we want to skip the rest of the logic in
   1743 # this function, and just call forward.
   1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1745         or _global_backward_pre_hooks or _global_backward_hooks
   1746         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747     return forward_call(*args, **kwargs)
   1749 result = None
   1750 called_always_called_hooks = set()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\models\\segformer\\modeling_segformer.py:809, in SegformerForSemanticSegmentation.forward(self, pixel_values, labels, output_attentions, output_hidden_states, return_dict)
    807 if self.config.num_labels > 1:
    808     loss_fct = CrossEntropyLoss(ignore_index=self.config.semantic_loss_ignore_index)
--> 809     loss = loss_fct(upsampled_logits, labels)
    810 elif self.config.num_labels == 1:
    811     valid_mask = ((labels >= 0) & (labels != self.config.semantic_loss_ignore_index)).float()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1736, in Module._wrapped_call_impl(self, *args, **kwargs)
   1734     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735 else:
-> 1736     return self._call_impl(*args, **kwargs)

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1747, in Module._call_impl(self, *args, **kwargs)
   1742 # If we don't have any hooks, we want to skip the rest of the logic in
   1743 # this function, and just call forward.
   1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1745         or _global_backward_pre_hooks or _global_backward_hooks
   1746         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747     return forward_call(*args, **kwargs)
   1749 result = None
   1750 called_always_called_hooks = set()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\loss.py:1293, in CrossEntropyLoss.forward(self, input, target)
   1292 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1293     return F.cross_entropy(
   1294         input,
   1295         target,
   1296         weight=self.weight,
   1297         ignore_index=self.ignore_index,
   1298         reduction=self.reduction,
   1299         label_smoothing=self.label_smoothing,
   1300     )

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\functional.py:3479, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3477 if size_average is not None or reduce is not None:
   3478     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3479 return torch._C._nn.cross_entropy_loss(
   3480     input,
   3481     target,
   3482     weight,
   3483     _Reduction.get_enum(reduction),
   3484     ignore_index,
   3485     label_smoothing,
   3486 )

IndexError: Target 2 is out of bounds."
}

Currently attempting to convert the image dataset to 2 dimension (Although I don’t know how to have RGB in two dimension, currently looking it up) to conform with the already two dimensional mask to my supervisor recommendation. If you perhaps have other tips I would really appreciate it. I also will check whether Cityscape models also have similar dimension between mask and training dataset.

One question, if I use nvidia/segformer-b5-finetuned-cityscapes-1024-1024model do I need to conform to their dimension as well like if it’s trained on 1024 x 1024 then should I also use 1024 x 1024 image?

1 Like

Wait a minute. I’m not familiar with manipulating data sets, but it seems to me that you’re taking a very roundabout route…:scream:
It seems like you don’t need to do so many complicated operations…?

What do you mean… could you tell me what should I do instead…

1 Like

For example, if you have all the necessary image data, you can just place them in directories named after their labels.
I think it was similar for audio files.
For more complex cases, I think you just write the settings in JSON or YAML…
For compound files, I think you just have to fiddle with CSV or something like that.

Create datasets

Use datasets for masking

Edit:
If we could ask nielsr, it would be 100% certain…

The thing is I don’t know what to “classify” each images are since what I want is to “segment a river” to “not river”. If you see my dataset in here I already make a mask which is a river and which are not (blackened). So the “segmentation” should be done already, I just need the model to follow the footsteps I’m tracing for them

can we tag them because I think I’m stuck with my choices here

1 Like

I see. I think there are a lot of people who let the big VLM do it, and there are also people who do it manually and people who combine several methods.

Does vision transformer only used in VLM category as in defining what’s in the image and what’s not? I saw lot’s of people successfully made their way in segmentation images like this segformer model did but sadly the paper wasn’t giving any clarity on what should I do if I want to finetune their model to my own images for personal uses.

1 Like

Instead, you can have the existing smart VLM help you create the dataset. Extreme case: ChatGPT (a last resort because it costs money)

How to do that exactly, like how does it differ to my current selfmade dataset

1 Like

If the dataset is complete, you shouldn’t need to do anything other than load it…:exploding_head:

As for processing the images, you can’t reduce the dimension if they are RGB, so you should convert them from RGB using PIL.Image.Image.convert(“L”) or something like that first, or process them after converting them to a numpy array.

So the dimension of my dataset doesn’t need to conform what the model have?

I find something called colormap although I don’t know how to make that happen in python yet

1 Like

So the dimension of my dataset doesn’t need to conform what the model have?

Yes. If you use the Hugging Face trainer or transformers in the end, I think the preprocessor will automatically convert it to the appropriate format. Using the method above. Of course there are limits, but as long as it is displayed in Hub, it’s fine. Maybe.

Which preprocessor to be exact?

1 Like

I think they are the same thing that is called internally.