How to determine dataset shape for specific needs

seand0101 · December 2, 2024, 1:25pm

Dear @John6666 and other community members,

According to this hf documentation about dataset. I could “label” dataset as I shape them into Dataset class like by making nested folder or with json metadata. Is there any reading about which type that I actually need for image segmentation?

Like for example this paper about segformer uses big dataset called Cityscapes that, according to its documentation uses coarse and fine annotations that is, if I’m catching it right, a pretrained labelled dataset that has been trained with their preceding mask.

If I want to make new model by fine-tuning segformer-cityscape model with this dataset I made by myself by masking it manually (not with annotating-AI), which shape that works and how do I make specific training argument for that data? if that sentence could make any sense…

John6666 · December 2, 2024, 2:38pm

Okay, I don’t know!
Or rather, the search for the optimal data set is a worthy research theme…
But there are quite a few good articles buried in HF articles, online papers, and github.
I’m sleepy today, so I’ll do a quick search while I work tomorrow…(23:38 here)

seand0101 · December 3, 2024, 6:29am

Okay tbh I might be went over too much stretch here wanting everything to be as perfect as it could for the first time, sorry bout that. Would appreciate that articles you found interesting, don’t worry, take your time and rest well thanks in advance

John6666 · December 4, 2024, 10:48am

Due to a problem that has occurred with users who upload a lot of data, including myself, I will not be able to reply for a while. I thought I’d let you know.

John6666 · December 10, 2024, 10:57am

Im back yesterday.

seand0101 · December 11, 2024, 1:09pm

The problem still the same, it’s dimension problem I think but it doesn’t matter whether I transform the mask into BW or binary (if it’s a different thing, I thought it wasn’t but I don’t know whether Pillow BW and OpenCV Binary are the same thing).

Should I copy whole code here?

The error message is

{
	"name": "IndexError",
	"message": "Target 2 is out of bounds.",
	"stack": "---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[16], line 12
      2 from transformers import Trainer
      4 trainer = Trainer(
      5     model=model,
      6     args=training_args,
   (...)
      9     compute_metrics=compute_metrics,
     10 )
---> 12 trainer.train()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:2155, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2152 try:
   2153     # Disable progress bars when uploading models during checkpoints to avoid polluting stdout
   2154     hf_hub_utils.disable_progress_bars()
-> 2155     return inner_training_loop(
   2156         args=args,
   2157         resume_from_checkpoint=resume_from_checkpoint,
   2158         trial=trial,
   2159         ignore_keys_for_eval=ignore_keys_for_eval,
   2160     )
   2161 finally:
   2162     hf_hub_utils.enable_progress_bars()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:2522, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2516 context = (
   2517     functools.partial(self.accelerator.no_sync, model=model)
   2518     if i != len(batch_samples) - 1
   2519     else contextlib.nullcontext
   2520 )
   2521 with context():
-> 2522     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
   2524 if (
   2525     args.logging_nan_inf_filter
   2526     and not is_torch_xla_available()
   2527     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2528 ):
   2529     # if loss is nan or inf simply add the average of previous logged losses
   2530     tr_loss = tr_loss + tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:3655, in Trainer.training_step(self, model, inputs, num_items_in_batch)
   3653         loss = self.compute_loss(model, inputs)
   3654     else:
-> 3655         loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
   3657 del inputs
   3658 if (
   3659     self.args.torch_empty_cache_steps is not None
   3660     and self.state.global_step % self.args.torch_empty_cache_steps == 0
   3661 ):

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\trainer.py:3709, in Trainer.compute_loss(self, model, inputs, return_outputs, num_items_in_batch)
   3707         loss_kwargs[\"num_items_in_batch\"] = num_items_in_batch
   3708     inputs = {**inputs, **loss_kwargs}
-> 3709 outputs = model(**inputs)
   3710 # Save past state if it exists
   3711 # TODO: this needs to be fixed and made cleaner later.
   3712 if self.args.past_index >= 0:

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1736, in Module._wrapped_call_impl(self, *args, **kwargs)
   1734     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735 else:
-> 1736     return self._call_impl(*args, **kwargs)

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1747, in Module._call_impl(self, *args, **kwargs)
   1742 # If we don't have any hooks, we want to skip the rest of the logic in
   1743 # this function, and just call forward.
   1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1745         or _global_backward_pre_hooks or _global_backward_hooks
   1746         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747     return forward_call(*args, **kwargs)
   1749 result = None
   1750 called_always_called_hooks = set()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\transformers\\models\\segformer\\modeling_segformer.py:809, in SegformerForSemanticSegmentation.forward(self, pixel_values, labels, output_attentions, output_hidden_states, return_dict)
    807 if self.config.num_labels > 1:
    808     loss_fct = CrossEntropyLoss(ignore_index=self.config.semantic_loss_ignore_index)
--> 809     loss = loss_fct(upsampled_logits, labels)
    810 elif self.config.num_labels == 1:
    811     valid_mask = ((labels >= 0) & (labels != self.config.semantic_loss_ignore_index)).float()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1736, in Module._wrapped_call_impl(self, *args, **kwargs)
   1734     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735 else:
-> 1736     return self._call_impl(*args, **kwargs)

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\module.py:1747, in Module._call_impl(self, *args, **kwargs)
   1742 # If we don't have any hooks, we want to skip the rest of the logic in
   1743 # this function, and just call forward.
   1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1745         or _global_backward_pre_hooks or _global_backward_hooks
   1746         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747     return forward_call(*args, **kwargs)
   1749 result = None
   1750 called_always_called_hooks = set()

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\modules\\loss.py:1293, in CrossEntropyLoss.forward(self, input, target)
   1292 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1293     return F.cross_entropy(
   1294         input,
   1295         target,
   1296         weight=self.weight,
   1297         ignore_index=self.ignore_index,
   1298         reduction=self.reduction,
   1299         label_smoothing=self.label_smoothing,
   1300     )

File c:\\Users\\Lenovo\\miniconda3\\envs\\pretrain-huggingface\\Lib\\site-packages\\torch\
n\\functional.py:3479, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3477 if size_average is not None or reduce is not None:
   3478     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3479 return torch._C._nn.cross_entropy_loss(
   3480     input,
   3481     target,
   3482     weight,
   3483     _Reduction.get_enum(reduction),
   3484     ignore_index,
   3485     label_smoothing,
   3486 )

IndexError: Target 2 is out of bounds."
}

Currently attempting to convert the image dataset to 2 dimension (Although I don’t know how to have RGB in two dimension, currently looking it up) to conform with the already two dimensional mask to my supervisor recommendation. If you perhaps have other tips I would really appreciate it. I also will check whether Cityscape models also have similar dimension between mask and training dataset.

One question, if I use nvidia/segformer-b5-finetuned-cityscapes-1024-1024model do I need to conform to their dimension as well like if it’s trained on 1024 x 1024 then should I also use 1024 x 1024 image?

John6666 · December 11, 2024, 1:21pm

Wait a minute. I’m not familiar with manipulating data sets, but it seems to me that you’re taking a very roundabout route…
It seems like you don’t need to do so many complicated operations…?

seand0101 · December 11, 2024, 1:22pm

What do you mean… could you tell me what should I do instead…

John6666 · December 11, 2024, 1:25pm

For example, if you have all the necessary image data, you can just place them in directories named after their labels.
I think it was similar for audio files.
For more complex cases, I think you just write the settings in JSON or YAML…
For compound files, I think you just have to fiddle with CSV or something like that.

Create datasets

Use datasets for masking

Edit:
If we could ask nielsr, it would be 100% certain…

seand0101 · December 11, 2024, 1:31pm

The thing is I don’t know what to “classify” each images are since what I want is to “segment a river” to “not river”. If you see my dataset in here I already make a mask which is a river and which are not (blackened). So the “segmentation” should be done already, I just need the model to follow the footsteps I’m tracing for them

can we tag them because I think I’m stuck with my choices here

John6666 · December 11, 2024, 1:34pm

I see. I think there are a lot of people who let the big VLM do it, and there are also people who do it manually and people who combine several methods.

seand0101 · December 11, 2024, 1:37pm

Does vision transformer only used in VLM category as in defining what’s in the image and what’s not? I saw lot’s of people successfully made their way in segmentation images like this segformer model did but sadly the paper wasn’t giving any clarity on what should I do if I want to finetune their model to my own images for personal uses.

John6666 · December 11, 2024, 1:38pm

Instead, you can have the existing smart VLM help you create the dataset. Extreme case: ChatGPT (a last resort because it costs money)

seand0101 · December 11, 2024, 1:40pm

How to do that exactly, like how does it differ to my current selfmade dataset

John6666 · December 11, 2024, 1:47pm

If the dataset is complete, you shouldn’t need to do anything other than load it…

As for processing the images, you can’t reduce the dimension if they are RGB, so you should convert them from RGB using PIL.Image.Image.convert(“L”) or something like that first, or process them after converting them to a numpy array.

seand0101 · December 11, 2024, 1:49pm

So the dimension of my dataset doesn’t need to conform what the model have?

I find something called colormap although I don’t know how to make that happen in python yet

John6666 · December 11, 2024, 1:51pm

So the dimension of my dataset doesn’t need to conform what the model have?

Yes. If you use the Hugging Face trainer or transformers in the end, I think the preprocessor will automatically convert it to the appropriate format. Using the method above. Of course there are limits, but as long as it is displayed in Hub, it’s fine. Maybe.

seand0101 · December 11, 2024, 2:04pm

Which preprocessor to be exact?

John6666 · December 11, 2024, 2:04pm

I think they are the same thing that is called internally.

Topic		Replies	Views
Dimension problem Beginners	26	77	December 21, 2024
How do I create a Image Segmentation Dataset 🤗Datasets	26	10273	April 11, 2024
Need help in dealing with out of bounds Beginners	16	108	December 23, 2024
Task Guides - Image segmentation 🤗Transformers	0	134	April 2, 2024
Labeling model with hugginface Dataset Beginners	0	700	September 1, 2022

How to determine dataset shape for specific needs

Create datasets

Use datasets for masking

Related topics