Array data preprocessing and classify

pbagiokos · July 8, 2025, 3:26pm

Hello

I have data with integers number from 100 arrays and i want to classify with an unsupervised model these 100 inputs. These array are not images are output arrays every 1 minute from a measurement system.

Do i need to preprocess these arrays before moving on to any algorithm?

Is there any similar example?

Thank you in advance

neonwatty · July 8, 2025, 6:24pm

while its not absolutely necessary, its almost always helpful - data normalization helps condition the contours of the optimization (loss) function associated with most ML models, basically helping to speed up the algorithm (particularly when using first order optimizers).

something like this - which can be called “standard” or “batch” normalization - will do the trick:

data = (data - data.mean(axis=1, keepdims=True)) / (data.std(axis=1, keepdims=True) + 1e-5)

here you subtract off the mean and divide off the standard deviation from the batch you’re working with (e.g., the entire dataset) first - then apply modeling to this normalized version of the data.

you’ll find flavors of this kind of normalization everywhere - even internal to neural networks like the transformer (which includes explicit “layer normalization” blocks - for the same reasons).

Topic		Replies	Views
When should you normalize? Beginners	0	257	January 14, 2024
Format of data during pre-training 🤗Datasets	1	347	October 7, 2020
Should I normalize text or not Beginners	4	1937	April 26, 2024
(How) should I pre-process my data for a transformer model used for classification (sentiment analysis)? Beginners	0	435	December 29, 2022
How to Train a Model with Pytorch Lightning with Huggingface 🤗Datasets	1	1152	April 4, 2024

Array data preprocessing and classify

Related topics