Array data preprocessing and classify

Hello

I have data with integers number from 100 arrays and i want to classify with an unsupervised model these 100 inputs. These array are not images are output arrays every 1 minute from a measurement system.

Do i need to preprocess these arrays before moving on to any algorithm?

Is there any similar example?

Thank you in advance

1 Like

while its not absolutely necessary, its almost always helpful - data normalization helps condition the contours of the optimization (loss) function associated with most ML models, basically helping to speed up the algorithm (particularly when using first order optimizers).

something like this - which can be called “standard” or “batch” normalization - will do the trick:

data = (data - data.mean(axis=1, keepdims=True)) / (data.std(axis=1, keepdims=True) + 1e-5)

here you subtract off the mean and divide off the standard deviation from the batch you’re working with (e.g., the entire dataset) first - then apply modeling to this normalized version of the data.

you’ll find flavors of this kind of normalization everywhere - even internal to neural networks like the transformer (which includes explicit “layer normalization” blocks - for the same reasons).

1 Like