Batched Generation with Flash Attention

In the link above, they talk about batching with flash attention. Though They seem to say that we should put all batches into one sequence rather than the usualy batching and padding approach.

Im really quite lost, it would be really useful to see an example of how to implement this.