Unfortunately, I cannot say too much about my data set but I am trying to predict hundreds of keypoints/landmarks on a given image/video feed.
I’m having great difficulty with my model architecture, I have not been able to get an accuracy greater than 20%.
My model is based on similar ones I found via GitHub and a few academic papers, however they were all predicting dozens of points vs my hundreds. I only saw two model architectures across these sources:
model = tf.keras.models.Sequential([
layers.Conv2D(32, (3,3), padding='same', input_shape=(512,512,1)),
layers.LeakyReLU(),
layers.MaxPool2D((2,2)),
layers.Conv2D(64, (3,3), padding='same'),
layers.LeakyReLU(),
layers.MaxPool2D((2,2)),
layers.Flatten(),
layers.BatchNormalization(),
layers.Dense(128),
layers.ReLU(),
layers.Dropout(0.5),
layers.Dense(64),
layers.ReLU(),
layers.Dropout(0.5),
layers.Dense(501)
])
model = tf.keras.models.Sequential([
layers.Conv2D(32, (5,5), input_shape=(512,512,1), strides=1),
layers.Conv2D(32, (3,3), strides=1),
layers.MaxPool2D((2,2), padding="valid"),
layers.BatchNormalization(),
layers.Dropout(0.2),
layers.Conv2D(64, (5,5), strides=2),
layers.Conv2D(64, (5,5), strides=2),
layers.AveragePooling2D((2,2), padding="valid"),
layers.Flatten(),
layers.Dense(128),
layers.ReLU(),
layers.Dropout(0.5),
layers.Dense(501),
layers.Softmax()
])
My dataset is quite large, roughly 15K; this includes augmented data. Just looking for feedback.
I’ve put this in the research section because I noticed that there’s very few models out there for keypoint detection; object detection seems to be much more popular.