Help Needed - Building a model from scratch for predicting the outcome of a bio-process

Hey Happy People,

I’m completely new to huggingface and so far I’m super excited and fascinated. I dont have too much experience with modeling, but am really interested in learning more.

I would love to hear some tips and advices to my problem from you! :hugs:

  • So I’m currently working on a project (in University as a student) in which we are producing an enzyme in a fermentation process. We can judge the quality of the product by a measuring method which gives a spectrum with different peaks.

  • There will be 8 different batches (with different cultivation parameters). Every batch is represented as a time series dataset with about 7-8 features and about 400 single datapoints. To every batch I will have 7 spectra which will allow me to evaluate the quality of the product. The 7 spectra a linked to different timestamps in the dataset.

  • I want to build a model which is capable of predicting the outcome of a batch, which means predicting the spectrum measurement, so the quality of the product. Based only on the starting parameters.

There is one big question arising for me.

How do I get to train the model properly? I need a target value for every datapoint, how can I do that? There is the possibility to interpolate the missing data points, but I think this might not be it. Also there is something called Gaussian processes with which you can do that, but I still need to get to know about this stuff better. Do you have any Ideas on how to deal with such a dataset?

What would your general approaches to this situation be and do you have any suggestions to what kind of model would be suitable for such a task? :saluting_face:

(Also people have used hybrid forms of mechanistical- and ML-models to predict biological processes. If someone has useful information on how to integrate something like this, I would be happy to hear that.)