Standardazing italian translation

This post summarizes the conversations we’ve had on the github issues page about standardizing the translation approach for the italian language.

At the end of this post, I’ll try to maintain a vocabulary of common terms and their italian translations.

Technical terms: to translate or not?

description: should we translate or not technical terms such as machine learning? Proposed approaches:

  1. always leave it in english
  2. always translate it in italian
  3. always translate it in italian, but in the first instance have the english term also associated to it.

outcome: No final decision yet, still being discussed.
votes:

  1. davidemastricci, CaterinaBi
  2. sharkovsky, lewtun, clone

How to address the reader

description: how to translate sentences that address the reader directly such as “But what if you want to compare…”
outcome: the proposed solution is to use the impersonal infinitive form “Ma cosa fare per paragonare…”. In exceptional cases where the sentence structure does not allow it, fall back to the informal singular form “Ma cosa fare se vuoi paragonare…”

Translating code comments

We should not translate comments in the code.

Vocabulary

Note: the decision whether to translate some words (marked with a *) is still being discussed, this is just a draft vocabulary!

English Italiano
machine learning apprendimento automatico*
token token
tokenization tokenizzazione
tokenizer tokenizzatore
neural network training addestramento di una rete neurale
pretrained model modello pre-addestrato
fine-tune affinare*
dataset insieme di dati*
large dataset insieme voluminoso di dati*
batch batch
Hub Hub
API API
loop ciclo*
training loop ciclo di addestramento
distributed setup sistema distribuito
custom metric metrica personalizzata
metric metrica
account profilo *
checkpoint checkpoint
label etichetta *
training set insieme di addestramento *
validation set insieme di validazione *
test set insieme di test *
overfitting sovra-addatamento
1 Like

Speaking about how to address the reader, the impersonal infinitive form sounds really good to me.
For the exceptional case, I would suggest using “Ma cosa fare se VUOI…” just because usually people reading the docs or attending the course are just one.

And if the intention was to keep it formal, I think the “VOLETE” form is just too formal for this context.

Hi Davide,
your suggestion about using the singular form makes sense. I’ve updated the post accordingly!

Thank you for starting this great post @sharkovsky !

About translating the code comments, I think this can be optional since right now the Colab notebooks we have for each chapter are still based off the English source. For reference, the other translations haven’t done this. How does that sound?

sounds great! I’ll update the post right away.

Thank you for opening this post! A few notes:

  • For the first question (whether to translate technical terms) I’d prefer option 3.
  • I am not a fan of addressing the reader directly. When I read technical material that addresses me directly as a reader, I always tend to instinctively consider it low-quality content. I understand people have different opinion about this, so I will comply with whatever the majority decides.
  • Specific terms:
    • I’m not sure preaddestrato is a valid word (can’t find it in the dictionary), so I would write it as pre-addestrato instead.
    • account is very widely used in Italian, but we could translate it with profilo if we wanted.
    • sovraddatamento has a valid meaning in Italian, but it’s quite specific. I would consider writing it as sovra-addatamento.

Thank you for your comments @clone!

I’ve updated the post to reflect your vote for the first question, and your suggestions for the words (thank you!).

For addressing the reader, is our suggested approach acceptable to you? The idea would be to use infinitive, impersonal and passive forms whenever possible, and address the reader directly only in the exceptional cases where the phrase sounds really weird otherwise. Is that ok for you?

Yep the approach you proposed sounds like a good compromise

Hi everyone.

I’d go with option number 3 as well when it comes to how we translate the technical terms, I believe @lewtun is right.

As for the ‘how to approach the reader question’, I believe the trend in translation studies right now would be to use an infinitive: Come fare per…, i.e., forget about addressing the reader directly. All those expressions sound absolutely unnatural in Italian. Another possibility would be to use the 1PP, e.g. ‘Ma come fare se vogliamo…?’. What do you think?

PS. For fine-tune, I’d like to propose ‘ottimizzare’, and I agree that ‘pre-addestrato’ needs the dash.

Hi everyone.

Thanks for the forum post.

With regards to technical terms, I think I will go with option 3 as well. I also agree with the proposed solution to use the impersonal infinitive to address the reader, which is the generally accepted way to do so.

Hello everyone, I’m happy to join! Let me start by pointing out that I have no technical background in linguistics, thus my opinions mainly come from my academic experience in the field of ML/DL and my (mostly personal) understanding of the common use of the Italian language.

In response to @sharkovsky and its first post (btw, thank you for kicking off the work!):

  • Technical terms: I would actually go for a sort of 4th approach (let’s say the inverse of point 3), let me firstly explain why. On the one hand, the real purpose of performing a translation is to make the content accessible to people, regardless of their familiarity with the source language. On the other hand, I think that the translation of technical content should reflect the original purpose, rather than mimicking the contained text: virtually no one uses translated terms in this field, and I don’t think it would be beneficial to insert them in the course. In practice, no one is going to talk about “apprendimento automatico” instead of “Machine Learning” in technical contexts, in the same way no one is talking about “condivisione delle macchine” instead of “car sharing” in everyday situations: a standard has already imposed, and I would not force our translation away from it. Besides, it does not improve readability nor comprehension, because the main gap is technical rather than linguistic (I mean, “apprendimento automatico” does not convey any more information than “Machine Learning” to a beginner). I would therefore give the translation for the sake of completeness (and coherence with the main goal of translation), maybe next to the first occurrence of the terms or by linking to the vocabulary/glossary, but I would opt to always go with original terms such as “Machine Learning” for the purpose of actual text content.
  • How to address the reader: I agree with this proposed solution by @CaterinaBi using the 1PP (‘Ma come fare se vogliamo…?’), since I generally agree with clone (not able to quote, sorry!).
  • Vocabulary: as already pointed out, I wouldn’t go for a translation of very widespread technical terms such as Machine Learning (see first point). As for “fine-tune” I don’t agree with @CaterinaBi since ‘ottimizzare’ could be misleading in the ML context (optimization algorithms such as ADAM, optimization techniques such as Local Search), while I think that “affinare” conveys the correct message and is not open to misunderstandings (maybe “affinare l’addestramento” if the clarification is needed). As for “large dataset” I would go with “grande insieme di dati”. I agree with the remaining proposed translations as reported in the current version of the table.

Please bear in mind that mine are “layman’s opinions”, meaning that they are the result of my personal experience alone. Anyway, I will be glad to comply with whatever the decision of the majority.
P.S. Sorry for the long reply, and thank you everyone for your work!

Hi everyone,
I seem to be unable to edit/update the post. @lewtun are you able to help me out on this, I am not familiar with the huggingface forum, maybe it’s normal? I need to be able to do it in order to update the vocabulary/glossary.

It seems to me that we’ve reached a decision on:

  • how to address the reader use the infinitive form “Ma cosa fare per paragonare…” whenever possible, and fall back onto 1PP “Ma cosa fare se vogliamo paragonare…” in exceptional cases.
  • code comments never translate comments in the code
  • vocabulary I will update as soon as I figure out how to edit my original post.

The one thing we’re still discussing is technical terms. It seems to me there are two options which are gaining traction, which are kinda opposite:

  1. always translate it in italian, but in the first instance have the english term also associated to it.
  2. always leave it in english, but in the first instance have the italian term also associated to it.

I had not thought about option 2 but now it seems to be the best in my opinion. What does everyone think?

I translated to Italian the Educational Toolkit.
I think the best option is the second one in your previous answer. While translating I found out that translating everything into Italian is very strange (while reading again, from an Italian reader’s perspective). I would vote for option 2 just because it can be useful to someone to associate a given term for the first time.

1 Like

Hi Moreno,
thank you for sharing your experience!

I would say that with your vote and to ensure consistency with your other translation, it would be best to go for option 2.

Do you have experience with this forum? Do you know why I cannot edit my original post anymore, and who I could contact to resolve this?

Thank you!
Francesco

Hello everyone, will any of you please check my translation of chapter1/1.mdx for consistency before I open a pull request as requested by @lewtun? Thanks!

Unfortunately, I’m not. I think it’s the same platform used by other projects (e.g., PyTorch) but I’ve just used it as a basic user.

Ciao Caterina!

Intanto complimenti per l’ottimo lavoro! La tua traduzione mi è sembrata molto accurata, precisa e scorrevole! Qui di sotto ti elenco alcuni commenti, come vedrai sono assolutamente minori. Per me una volta corretti i typo potresti tranquillamente mandare il pull request, ma se hai voglia di prendere in considerazione anche i miei altri commenti ne sarò contento :smiley:

  • Hai tradotto task con compito. A me piace come traduzione e sono d’accordo che sia giusto tradurre la parola task, quindi appena possibile lo aggiungo al nostro vocabolario di traduzioni standardizzate.
  • typo: Accelerates non dovrebbe avere la s finale
  • Hai assegnato il genere femminile a Hugging Face Hub. Ovviamente è totalmente arbitrario, però a me suona meglio maschile… Che ne dici?
  • typo: uyna volta hai scrittoTokeniser (alla british english). Mi sembra che huggingface segua più che altro american english, quindi Tokenizer.
  • aggiungo anche demo alle parole da non tradurre, mi sembra una buona idea!
  • Hai lasciato il genitivo sassone a fast.ai’s Practical Deep Learning for Coders, ma lo toglierei così: il Practical Deep Learning for Coders di fast.ai

Grazie ancora!
Francesco

2 Likes

Grazie, Francesco!

Sono d’accordo su tutto. Ho corretto e ora invio la…richiesta di estrazione?? Fa strano detto così.

Un saluto e grazie a te!

Caterina

1 Like

Hello @lewtun, I’ve officially just opened my first pull request ever (!!!) for 1.mdx.

I hope all is good!

Caterina

1 Like

@CaterinaBi I took a quick look, great work so far but there are a couple of things that I’m pretty sure need to be fixed. I’ve sent you a comment on github but will summarize it here:

  • your fork contains the original english language files twice: once (correctly) in the chapters/en folder, and a second time in the main folder. You should probably delete those in the main folder and resubmit the pull request
  • AFAIK, it’s more “polite” to pull and merge the changes in the original huggingface repo to your fork before submitting a pull request, because in the meantime while you did your work they also progressed. It’s a simple situation so I’m sure git wouldn’t have any problems merging, but usually it’s nicer (again, as far as I know) if you pull the changes from huggingface’s main branch before submitting a pull request. This way if there are any conflicts you can solve them, instead of putting this complexity on the maintainers of the huggingface repo (again, in this situation I don’t think there would be any conflicts). If you need help with this operation let me know!