T5 Model, T5 Encoder Model and T5 Model for Conditional Generation

I’m trying to understand what the difference is between the models from the topic.

My understanding so far:
The T5 model is built like an encoder-decoder setup (similar to Autoencoder - I guess?). The T5 Encoder Model is the mentioned encoder part.

Assuming I have a tokenized sentence of length N. Applying the T5 encoder part results in a tensor of size N x d_model, corresponding to N embedding vectors.
(1.) a. What happens if I apply the whole model? What’s the output dimensionality?
(1.) b. What would the output be and how could I use it differently from the encoder part? Is it meaningful?

Regarding the Conditional Generation Model: Given an input sequence it can generate an output sequence (e.g. translated)
(2.) a. What is the general purpose of the conditional generation T5 Model such that the normal T5 Model does not suffice?
(2.) b. Where do they differ?

Moreover, assuming I have pairs of sentences in 2 languages:
(“Hello, I am Bob”, “Hola, soy bob”)
(3.) a. How should I train / fine-tune the T5 Model to provide me with an embedding space that reflects properties from both languages? Is that even possible?
(3.) b. Would I need to use the conditional generation T5 Model?
(3.) c. How would I prepare my data for this task?
(3.) d. Assuming I have a T5 Model trained via MLM for English, can I adopt it for translation purposes through fine-tuning somehow or would I need to train a T5 Model from scratch?