Right way of using discofuse dataset

akesh1235 · June 14, 2023, 8:34am

Click here for Dataset link
Below is the following way, as per my understanding , Is it correct

The columns/features from DiscoFuse dataset that will be the input to the encoder and decoder are:

coherent_first_sentence
coherent_second_sentence
incoherent_first_sentence
incoherent_second_sentence

The encoder will take these four columns as input and encode them into a sequence of hidden states. The decoder will then take these hidden states as input and decode them into a new sentence that fuses the two original sentences together.

The discourse type, connective_string, has_coref_type_pronoun, and has_coref_type_nominal columns will not be used as input to the encoder or decoder. These columns are used to provide additional information about the dataset, but they are not necessary for the task of sentence fusion.

Please correct me if I am wrong; otherwise, if this understanding is right, how shall I implement this task practically?

Topic		Replies	Views
Fine-Tune a T5 for sentence fusion Flax/JAX Projects	1	1012	June 25, 2021
Fusion-in-Decoder models 🤗Transformers	3	3000	April 20, 2023
Seq2Seq Encoder Decoder model Tensorflow 🤗Transformers	4	763	January 19, 2021
Unified interface for encoder-decoder and decoder-only translation Beginners	0	239	December 14, 2022
Use two sentences as inputs for sentence classification 🤗Transformers	7	20395	April 21, 2022

Right way of using discofuse dataset

Related topics