I have a question about the transformer architecture. I think there are multiple decoder stacks and multiple encoder stacks in the architecture. Does every decoder stack receive information from the encoders? And if so, do they all receive the same information from the last encoder? Or if not, perhaps only the first decoder receive information from the last encoder?
1 Like