I use the model to encode the 3d input in the shape of (3, 5, 64), where 3 is the batch size, 5 is the number of utterance of each sample (can be considered as the second batch size), 64 is the seq length.
I encode them by:
model = BertModel.from_pretrained("bert-base-uncased")
batch = []
input_ids = batch["input_ids"]
attention_mask = batch["attention_mask"]
for i in input_ids.shape[0]:
bert_output = model.forward(input_ids=input_ids[i], attention_mask=attention_mask[i])
stack.append(bert_output)
out = torch.stack(stack)
I wonder if there’s other ways of doing this?