Right way of using discofuse dataset

Below is the following way, as per my understanding , Is it correct :question: :question:

The columns/features from DiscoFuse dataset that will be the input to the encoder and decoder are:

  1. coherent_first_sentence

  2. coherent_second_sentence

  3. incoherent_first_sentence

  4. incoherent_second_sentence

The encoder will take these four columns as input and encode them into a sequence of hidden states. The decoder will then take these hidden states as input and decode them into a new sentence that fuses the two original sentences together.

The discourse type, connective_string, has_coref_type_pronoun, and has_coref_type_nominal columns will not be used as input to the encoder or decoder. These columns are used to provide additional information about the dataset, but they are not necessary for the task of sentence fusion.

Please correct me if I am wrong; otherwise, if this understanding is right, how shall I implement this task practically?