I am training a Video Classification Model. In the Evaluation Phase, I segment a video into 5 parts, and each segment will be sampled for a clip. Below is the code for my val_dataset
and val_loader
:
val_dataset = pytorchvideo.data.Ucf101(
data_path=os.path.join(dataset_root_path, "val"),
clip_sampler=pytorchvideo.data.make_clip_sampler(
"constant_clips_per_video", clip_duration, 5), # Note the positional arguments
decode_audio=False,
transform=val_transform,
video_sampler=torch.utils.data.SequentialSampler,
)
valloader = DataLoader(val_dataset, batch_size=5, shuffle=False) # batch_size is 5 because I segment a video into 5, so each loop through valloader is ONLY FOR ONE VIDEO.
So how do I configure the Trainer for correctly evaluating? I have looked into batch_eval_metrics
, but is it the right approach in this case?
Because I want the accuracy of one video to be the average of 5 clips. Is there a better way to evaluate in my case?
Note that val_dataset
is a class of pytorchvideo.data
, and it is an IterableDataset
.
Thanks for helping.