Adapting a Big Tech CV model for Object Detection to TIFF images


Because of the length of my question, I though the forum is more suited than the Discord Server. I swear I browsed (a little) the other topics from beginners as me and did not find any results.
My questions are very basic, almost dumb I know. But I need confirmation.

So, I’m (also) new to neural network, still learning and I have a project for my training program. I could choose a topic and I decided to help a friend who is doing her PhD in Solid Physics and has a hard time with images. Hence the CV topic.
She studies adsorption (absorption of molecules a 2D materials) which she investigates via an STM (‘Scanning Tunneling Microscopic’ → microscopy based on the electronic density of the atoms). She ends up with images (I can’t upload an overall image as an example because of an NDA, but I could upload a zoom on a region to give a taste of the look of the image but not disclose the overall atomic structure, sorry) displaying the atoms as ‘circles’ more or less bright and more or less big depending on the distance (because at that scale, a surface is not a plane as some connections between the atoms have angles).
She thousands images of 80 to 100 nm² on which she has to count the number of adsorbed molecules after treatment and to class them depending on where they adsorbed with respect to the structure of the surface. The end goal is to compute some statistics and to study them on the treatment conditions. She is doing it (the detection, categorization and counting) by the eye, on jpeg images and that takes a while. And she will have finished by the end of her PhD. SO I thought: Computer Vision.
I browsed the Tasks page and ended up with the Object Detection one. I’d like to draw a colored box on each interesting object and to perform the statistics so she has the result in one go.

We started labelling the jpg images with CVAT. But then, her supervisor came by and told: “Hey, the source file from the STM gives the depth at each location with a precision of the Angstrom. The heights on the surface go up to some tens of nanometers and we have that information. If you work with jpeg, you know you have a loss of information due to the 8-bit compression, right?” And he was right and I did not think about at all.
In the end, his suggestion was to work with TIFF images, which I barely know.

I intended to take a CV model from HF (let’s say Facebook’s detr-resNet-50 - Why? It is currently the first one presented on the Object Detection page) and adapt it to my case by changing the output layer and performing a training on something like 50 labelled images.
But now I’m confused. This model, as well as all the others I checked on HF, has been trained on 8-bit formats (jpg or png) and I’m not sure whether it can be used for TIFF.(?)
I spent some time browsing and kept these two sources: Training a CNN with TIFF images in pytorch - PyTorch Forums and Working with non-8-bit images - Albumentations Documentation.
Bt I’d like a confirmation.

Here is a list of questions I have at the moment:

  • I know the 8-bit encoding is normalized anyway; for the model I cited before the RGB channels have a mean (0.485, 0.456, 0.406). So, by applying a standard normalization, I should be able to apply the model. (right?) (–> In the end, for the model, an image is ‘just’ tensor of floats.)
  • By the way, what should I do if my means are far away from the ones presented above? Should I suspect the model will not perform as well? Should this info on my dataset guide me to choose a model? Do we have any clue on how to consider this kind of info?
  • I also think I read somewhere TIFF images can have more channels than the RGB. If so, an architecture developed to work with three channels cannot be applied. (I’m pretty sure of this one, I just need a yes to be fully convinced)

Thank you in advance!