Summary length for knowledge graphs vs long documents

I am training the unlimiformer (augmentation of the bart-base) model on the GovReport dataset to generate summaries. I have three experiments:

  1. Long Documents (LDs)
  2. Knowledge Graphs (KGs) of long documents
  3. KG_LD (KG + LD concatenated in this order before tokenization)
    Here is the odd thing:

for (1) I find that my LD summaries are always around 100 tokens.
for (2) I find my KG or KG_LD summaries converge to around 900-1000 tokens.

My model config has max_target_length=1024

  "model_name_or_path": "tau/bart-base-sled",                                      
  "use_auth_token": false,                                                         
  "max_target_length": 1024,                                                       
  "fp16": true                                                                     

whereas in the checkpoint config I see that “summary” has max_length=128 :

    "summarization": {                                                          
      "length_penalty": 1.0,                                                    
      "max_length": 128,                                                        
      "min_length": 12,                                                         
      "num_beams": 4                                                            

In data config, generation_max_length is:

  "dataset_name": "tau/sled",                                                      
  "dataset_config_name": "gov_report",                                             
  "max_source_length": 16384,                                                      
  "generation_max_length": 1024,                                                   
  "max_prefix_length": 0,                                                          
  "pad_prefix": false,                                                             
  "num_train_epochs": 10,                                                          
  "metric_names": ["rouge"],                                                       
  "metric_for_best_model": "rouge/geometric_mean",                                 
  "greater_is_better": true                                                        

The only difference in the structure of the inputs is that the KGs are input as a single string of the form:
"<s> head_1 : relation_1 : tail_1 </s><s> head_2 : relation_2 : tail_2 </s>..."

Could this be causing the model to attempt to generate a summary for every triple? It certainly doesn’t seem that way when I read the output: the summaries of the LDs are uninformative and much shorter than the golden summaries. In contrast, the summaries of the KGs are very rich in information (though often inaccurate of course).