I’m working on an implementation step by step of S-BERT for a domain specific application. I’d like to save my model after training, I’m using HF distilbert-base-uncased
as the model
but I’m also passing the output of model
to a concatenation layer and dense
layer.
If just do torch.save(model)
I’m only going to save the distilbert-base-uncased
portion of the model not the complete model with dense layer at the end. How should I proceed please ?
Here is the implementation:
for batch in loop:
# make sure model is in training mode
model.train()
# zero all gradients on each new step
optim.zero_grad()
# prepare batches and more all to the active device
inputs_ids_a = batch['premise_input_ids'].to(device)
inputs_ids_b = batch['hypothesis_input_ids'].to(device)
attention_a = batch['premise_attention_mask'].to(device)
attention_b = batch['hypothesis_attention_mask'].to(device)
label = batch['label'].to(device)
# extract token embeddings from BERT
u = model(inputs_ids_a, attention_mask=attention_a)[0] # all token embeddings A
v = model(inputs_ids_b, attention_mask=attention_b)[0] # all token embeddings B
# get the mean pooled vectors
u = mean_pool(u, attention_a)
v = mean_pool(v, attention_b)
# build the |u-v| tensor
uv = torch.sub(u, v)
uv_abs = torch.abs(uv)
# concatenate u, v, |u-v|
x = torch.cat([u, v, uv_abs], dim=-1)
# process concatenated tensor through FFNN
x = ffnn(x)