Hello,
I have been working on fine/tuning a YOLOs model for fashion detection.
The model is "hustvl/yolos-small"
and the training and validation data is "detection-datasets/fashionpedia"
.
I am trying to filter the original dataset and add some augmentation data for the minority categories.
However, whenever I try to add any new images with bounding boxes for the minority categories. The model after 1 epoch only predicts the category I tried to augment.
I tried working only with the original dataset and I encountered something very strange.
Whenever I tried to filter so that only 1000 bounding boxes of each category appear in the dataset with the first function, the model encountered the same issue after the first epoch. The model would result in only predicting 1 or 2 of the labels. No matter how many more epochs I did, that would not change. However, when using the second filtering function, which I thought did the same, the training goes as expected and the model predicts all of the different labels. I have already tried to see if the issue is the rewrapp_data_1
and rewrapp_data_2
functions, but when only using those on the original data without filtering, both make the predictions after 1 epoch work correctly.
I have tried using the same seed in both of the samplings to compare the outputs directly and see if there are any differences between them, but have found none.
After days of looking into what might be causing the issue I come to the forum to see if anyone has encountered something similar or knows what might be causing this issue.
If any more information or the mode for training or evaluation is needed, please let me know and I will add it to the post.
Here are the two filtering functions:
def filter_dataset_doesnt_work(train_dataset,num_samples=1000,seed = 100):
print('Filtering dataset with', num_samples,'samples per category ..., but this one doesnt work')
train_df = train_dataset.to_pandas()
train_df_unwrapped = pd.concat([train_df.drop(['objects'], axis=1), train_df['objects'].apply(pd.Series)], axis=1).explode(['bbox_id','category','bbox','area'])
# Unwrapping for filtering
CATEGORY_FILTER = [ALL_CATEGORIES.index(cat) for cat in CONSIDERED_CATEGORIES]
train_df_unwrapped = train_df_unwrapped[train_df_unwrapped['category'].isin(CATEGORY_FILTER)]
train_df_unwrapped_sampled = pd.concat([train_df_unwrapped[train_df_unwrapped['category'] == cat].sample(n=num_samples,random_state=seed)
for cat in train_df_unwrapped.category.unique().tolist()])
train_dataframe_sampled = rewrapp_data_1(train_df_unwrapped_sampled)
train_dataset = Dataset.from_pandas(train_dataframe_sampled,preserve_index=False,features=FASHIONPEDIA_FEATURES).cast_column("image", Image())
return train_dataframe_sampled
def filter_dataset_works(train_dataset,num_samples=1000,seed = 100):
print('Filtering dataset with', num_samples,'samples per category ..., but this one works')
train_df = train_dataset.to_pandas()
train_df_unwrapped = pd.concat([train_df.drop(['objects'], axis=1), train_df['objects'].apply(pd.Series)], axis=1).explode(['bbox_id','category','bbox','area'])
CATEGORY_FILTER = [ALL_CATEGORIES.index(cat) for cat in CONSIDERED_CATEGORIES]
train_df_unwrapped = train_df_unwrapped[train_df_unwrapped['category'].isin(CATEGORY_FILTER)]
train_df_unwrapped_sampled = pd.concat([train_df_unwrapped[train_df_unwrapped['category'] == cat].sample(n=num_samples,random_state=seed)
for cat in train_df_unwrapped.category.unique().tolist()]).groupby(['image_id']).agg(list).reset_index()
train_dataframe_sampled = rewrapp_data_2(train_df_unwrapped_sampled)
train_dataset = Dataset.from_pandas(train_dataframe_sampled,preserve_index=False,features=FASHIONPEDIA_FEATURES).cast_column("image", Image())
return train_dataframe_sampled
Thanks a lot for anyone that takes the time to read this post, even if you dont know the answer.