Hi,
I am using LEDForConditionalGeneration
for fine-tuning a summarization model. I am able to finetune the network and obtain good results for my purpose. But now, I want to modify the encoder to use some different architecture. How can I break the model.generate()
into individual components of the transformer?
For example, I want something like this:
outputs = model(input_ids)
encoder_outputs = outputs["encoder_last_hidden_state"] #For encoder outputs
# Similar logic for decoder outputs and then beam search to get text sentences
How can I obtain such encoder decoder outputs and obtain exactly the same result that I get with model.generate
? A sample code for model.generate
is as follows
ARTICLE_TO_SUMMARIZE = "Some long document that I want to summarize.
inputs = tokenizer.encode(ARTICLE_TO_SUMMARIZE, return_tensors="pt")
global_attention_mask = torch.zeros_like(inputs)
global_attention_mask[:, 0] = 1
# Generate Summary
summary_ids = model.generate(inputs, global_attention_mask=global_attention_mask, num_beams=3, max_length=32)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False))
Any help in this direction will be appreciated. Thanks!