Given a sequence output with 256 tokens, is it logical or reasonable to split it into two equal length sub-sequence which are used for two independent downstream tasks?
Given a sequence output with 256 tokens, is it logical or reasonable to split it into two equal length sub-sequence which are used for two independent downstream tasks?