Model training/inference with multiple similar models in parallel

junoriosity · December 27, 2024, 11:33pm

Assume I have pre-trained neural network from Huggingface which I execute via transformers.

Now I have several, say, 64, instances and they should all make predictions with that neural network in batches. The problem is that each of these instances has a certain sub model/field we want to use for this instance.

Now the question is, can we train the last layer or perhaps the bias term in such a way for EACH different sub model/field, such that we can process all the requests together in one batch, despite each instance gets the content particularly suited to his scientific field.
So it should not be that we have separate neural networks for each scientific field and we do batch jobs there.
Instead it should be one big batch, but at the very end of each token generation process there are the deviations for each scientific model/field.

Is that possible? It would be great if you could help us here.

Alanturner2 · December 28, 2024, 7:51am

Load the shared pre-trained model, which serves as the feature extractor for all instances:

python

Copy code

from transformers import AutoModel, AutoTokenizer

# Load pre-trained model and tokenizer
model_name = "bert-base-uncased"
shared_model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Step 2: Task-Specific Heads

For each scientific field, define a lightweight linear head or bias vector. For example:

python

Copy code

import torch
from torch import nn

# Define task-specific heads
num_fields = 64  # Number of fields
hidden_size = shared_model.config.hidden_size
task_heads = nn.ModuleList([nn.Linear(hidden_size, hidden_size) for _ in range(num_fields)])

Alternatively, you can use bias adjustments:

python

Copy code

# Define field-specific biases
field_biases = nn.ParameterList([nn.Parameter(torch.zeros(hidden_size)) for _ in range(num_fields)])

Step 3: Forward Pass with Batch Processing

Tokenize input data for all instances:

python

Copy code

inputs = tokenizer(batch_texts, padding=True, truncation=True, return_tensors="pt")

Process the batch with the shared model:

python

Copy code

shared_outputs = shared_model(**inputs).last_hidden_state  # Shape: (batch_size, seq_len, hidden_size)

Apply task-specific heads or biases based on the field of each instance:

python

Copy code

# Assume `fields` is a tensor indicating the field index for each instance
batch_size = len(fields)
outputs = torch.zeros_like(shared_outputs)

for i in range(batch_size):
    field_idx = fields[i]
    outputs[i] = task_heads[field_idx](shared_outputs[i])  # Linear head
    # Alternatively: outputs[i] += field_biases[field_idx]  # Bias adjustment

Step 4: Training

Freeze the shared model weights to prevent fine-tuning if only the heads/biases need training.
Train the task-specific heads/bias terms using a task-specific loss function (e.g., cross-entropy for classification).

python

Copy code

for param in shared_model.parameters():
    param.requires_grad = False

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(task_heads.parameters(), lr=1e-3)

junoriosity · December 28, 2024, 10:16pm

@Alanturner2 Many thanks for your input, this looks very good.

I have two questions, where you can perhaps help me.

1.Can we also use classes like BioGptForCausalLM or GPTNeoXForCausalLM? Or should we consider something else?
2. Can we train for each submodel individually or should it be all together?
3. How would the inference look like in your scenario?

It would be terrific if you could share your thoughts here.

Alanturner2 · December 29, 2024, 9:22am

1. Can we use `BioGptForCausalLM` or `GPTNeoXForCausalLM`?

Yes, these classes are suitable depending on your task:

BioGptForCausalLM: Ideal for biomedical tasks due to domain-specific pretraining.
GPTNeoXForCausalLM: A strong general-purpose model for large-scale tasks.
Choose based on domain, compute resources, and the need for fine-tuning.

2. Train submodels individually or together?

Individually: Works for modular architectures; fine-tune parts separately for flexibility.
Together: Better for end-to-end optimization but needs more resources and careful tuning.

3. How does inference look?

Load model & tokenizer:

python

Copy code

from transformers import AutoTokenizer, AutoModelForCausalLM  
model = AutoModelForCausalLM.from_pretrained("model_name")  
tokenizer = AutoTokenizer.from_pretrained("model_name")

Generate text:

python

Copy code

inputs = tokenizer("Your prompt", return_tensors="pt")  
output = model.generate(**inputs, max_length=50, do_sample=True)  
print(tokenizer.decode(output[0], skip_special_tokens=True))

For modular models: Pass outputs between submodels and manage compatibility.

junoriosity · December 29, 2024, 2:35pm

@Alanturner2 Many thanks again, this brings me quite some step further.

However, could you please elaborate a little further on the text generation, when having many different submodels to make use of for the different instances?

Topic		Replies	Views
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9509	October 16, 2024
Model inference on tokenized dataset 🤗Datasets	2	6228	March 22, 2023
How to set batchsize of inference Beginners	1	296	October 17, 2024
Fine-tuning BERT with multiple classification heads 🤗Transformers	10	5402	January 19, 2024
Parallelize model call for TFBertModel 🤗Transformers	3	1028	January 7, 2021

Model training/inference with multiple similar models in parallel

Step 2: Task-Specific Heads

Step 3: Forward Pass with Batch Processing

Step 4: Training

1. Can we use BioGptForCausalLM or GPTNeoXForCausalLM?

2. Train submodels individually or together?

3. How does inference look?

Related topics

1. Can we use `BioGptForCausalLM` or `GPTNeoXForCausalLM`?