Assume I have pre-trained neural network from Huggingface which I execute via transformers.
Now I have several, say, 64, instances and they should all make predictions with that neural network in batches. The problem is that each of these instances has a certain sub model/field we want to use for this instance.
Now the question is, can we train the last layer or perhaps the bias term in such a way for EACH different sub model/field, such that we can process all the requests together in one batch, despite each instance gets the content particularly suited to his scientific field.
So it should not be that we have separate neural networks for each scientific field and we do batch jobs there.
Instead it should be one big batch, but at the very end of each token generation process there are the deviations for each scientific model/field.
Is that possible? It would be great if you could help us here.
1 Like
Load the shared pre-trained model, which serves as the feature extractor for all instances:
python
Copy code
from transformers import AutoModel, AutoTokenizer
# Load pre-trained model and tokenizer
model_name = "bert-base-uncased"
shared_model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
Step 2: Task-Specific Heads
For each scientific field, define a lightweight linear head or bias vector. For example:
python
Copy code
import torch
from torch import nn
# Define task-specific heads
num_fields = 64 # Number of fields
hidden_size = shared_model.config.hidden_size
task_heads = nn.ModuleList([nn.Linear(hidden_size, hidden_size) for _ in range(num_fields)])
Alternatively, you can use bias adjustments:
python
Copy code
# Define field-specific biases
field_biases = nn.ParameterList([nn.Parameter(torch.zeros(hidden_size)) for _ in range(num_fields)])
Step 3: Forward Pass with Batch Processing
- Tokenize input data for all instances:
python
Copy code
inputs = tokenizer(batch_texts, padding=True, truncation=True, return_tensors="pt")
- Process the batch with the shared model:
python
Copy code
shared_outputs = shared_model(**inputs).last_hidden_state # Shape: (batch_size, seq_len, hidden_size)
- Apply task-specific heads or biases based on the field of each instance:
python
Copy code
# Assume `fields` is a tensor indicating the field index for each instance
batch_size = len(fields)
outputs = torch.zeros_like(shared_outputs)
for i in range(batch_size):
field_idx = fields[i]
outputs[i] = task_heads[field_idx](shared_outputs[i]) # Linear head
# Alternatively: outputs[i] += field_biases[field_idx] # Bias adjustment
Step 4: Training
- Freeze the shared model weights to prevent fine-tuning if only the heads/biases need training.
- Train the task-specific heads/bias terms using a task-specific loss function (e.g., cross-entropy for classification).
python
Copy code
for param in shared_model.parameters():
param.requires_grad = False
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(task_heads.parameters(), lr=1e-3)
2 Likes
@Alanturner2 Many thanks for your input, this looks very good.
I have two questions, where you can perhaps help me.
1.Can we also use classes like BioGptForCausalLM or GPTNeoXForCausalLM? Or should we consider something else?
2. Can we train for each submodel individually or should it be all together?
3. How would the inference look like in your scenario?
It would be terrific if you could share your thoughts here.
1 Like
1. Can we use BioGptForCausalLM
or GPTNeoXForCausalLM
?
Yes, these classes are suitable depending on your task:
BioGptForCausalLM
: Ideal for biomedical tasks due to domain-specific pretraining.
GPTNeoXForCausalLM
: A strong general-purpose model for large-scale tasks.
Choose based on domain, compute resources, and the need for fine-tuning.
2. Train submodels individually or together?
- Individually: Works for modular architectures; fine-tune parts separately for flexibility.
- Together: Better for end-to-end optimization but needs more resources and careful tuning.
3. How does inference look?
- Load model & tokenizer:
python
Copy code
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("model_name")
tokenizer = AutoTokenizer.from_pretrained("model_name")
- Generate text:
python
Copy code
inputs = tokenizer("Your prompt", return_tensors="pt")
output = model.generate(**inputs, max_length=50, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))
- For modular models: Pass outputs between submodels and manage compatibility.
2 Likes
@Alanturner2 Many thanks again, this brings me quite some step further.
However, could you please elaborate a little further on the text generation, when having many different submodels to make use of for the different instances?
1 Like