How to group sentences in dataset for muti-turn dialogue conversation?

I am trying to train an encoder decoder model on the dataset empathetic_dialogues · Datasets at Hugging Face .

The dialogue is formatted as follows,

Here the conv_id indicates an unique conversation, and the speaker_idx denote the speaker and the listener. I would like to group the utterances as follows,

For utterance index 1: input is … utterance1 …
For utterance index 2: input is … utterance1 … … utterance2 … … utterance2 …
and so on.

Is there a way to achieve this in huggingface datasets without transforming it into dataframe and back? A subsequent question is, what is the general pipeline followed in industry for training such an multi-turn dialogue agent.

Thanks in advance for the help. This is my first question in the forum. If I have made some mistakes please let me know. I will quickly correct it. :slight_smile: