Smaller embedding size causes lower loss

When I convert my multilingual transformer model to a single lingual transformer model(got my languages embedding from the multilingual transformer and deleted other embeddings, decreased dimensions of embedding layers), the loss is much less. But I didn’t understand why. What can be the reason for that?