Since it takes too long to load data, It might be helpful to share normalized histogram of my dataset.
normalized number of sample list:
[‘0.1512’, ‘1.0000’, ‘0.8367’, ‘0.8265’, ‘0.6045’, ‘0.2867’, ‘0.1256’, ‘0.0611’, ‘0.0330’, ‘0.0189’, ‘0.0103’, ‘0.0057’, ‘0.0034’, ‘0.0017’, ‘0.0011’, ‘0.0006’, ‘0.0004’, ‘0.0002’, ‘0.0001’]
corresponding seconds:
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19]
I have sounds which are more than 19sn. It can be a problem. I think padding is done in batch (default behavior), so each batch have different shape. First batch may have long duration. I might check a subset which is restricted to less than 6sn.