Standardazing italian translation

sharkovsky · April 1, 2022, 8:47am

This post summarizes the conversations we’ve had on the github issues page about standardizing the translation approach for the italian language.

At the end of this post, I’ll try to maintain a vocabulary of common terms and their italian translations.

Technical terms: to translate or not?

description: should we translate or not technical terms such as machine learning? Proposed approaches:

always leave it in english
always translate it in italian
always translate it in italian, but in the first instance have the english term also associated to it.

outcome: No final decision yet, still being discussed.
votes:

davidemastricci, CaterinaBi
sharkovsky, lewtun, clone

How to address the reader

description: how to translate sentences that address the reader directly such as “But what if you want to compare…”
outcome: the proposed solution is to use the impersonal infinitive form “Ma cosa fare per paragonare…”. In exceptional cases where the sentence structure does not allow it, fall back to the informal singular form “Ma cosa fare se vuoi paragonare…”

Translating code comments

We should not translate comments in the code.

Vocabulary

Note: the decision whether to translate some words (marked with a *) is still being discussed, this is just a draft vocabulary!

English	Italiano
machine learning	apprendimento automatico*
token	token
tokenization	tokenizzazione
tokenizer	tokenizzatore
neural network training	addestramento di una rete neurale
pretrained model	modello pre-addestrato
fine-tune	affinare*
dataset	insieme di dati*
large dataset	insieme voluminoso di dati*
batch	batch
Hub	Hub
API	API
loop	ciclo*
training loop	ciclo di addestramento
distributed setup	sistema distribuito
custom metric	metrica personalizzata
metric	metrica
account	profilo *
checkpoint	checkpoint
label	etichetta *
training set	insieme di addestramento *
validation set	insieme di validazione *
test set	insieme di test *
overfitting	sovra-addatamento

davidemastricci · April 1, 2022, 10:51am

Speaking about how to address the reader, the impersonal infinitive form sounds really good to me.
For the exceptional case, I would suggest using “Ma cosa fare se VUOI…” just because usually people reading the docs or attending the course are just one.

And if the intention was to keep it formal, I think the “VOLETE” form is just too formal for this context.

sharkovsky · April 1, 2022, 1:18pm

Hi Davide,
your suggestion about using the singular form makes sense. I’ve updated the post accordingly!

lewtun · April 1, 2022, 2:58pm

Thank you for starting this great post @sharkovsky !

About translating the code comments, I think this can be optional since right now the Colab notebooks we have for each chapter are still based off the English source. For reference, the other translations haven’t done this. How does that sound?

sharkovsky · April 1, 2022, 3:00pm

sounds great! I’ll update the post right away.

clone · April 1, 2022, 3:38pm

Thank you for opening this post! A few notes:

For the first question (whether to translate technical terms) I’d prefer option 3.
I am not a fan of addressing the reader directly. When I read technical material that addresses me directly as a reader, I always tend to instinctively consider it low-quality content. I understand people have different opinion about this, so I will comply with whatever the majority decides.
Specific terms:
- I’m not sure preaddestrato is a valid word (can’t find it in the dictionary), so I would write it as pre-addestrato instead.
- account is very widely used in Italian, but we could translate it with profilo if we wanted.
- sovraddatamento has a valid meaning in Italian, but it’s quite specific. I would consider writing it as sovra-addatamento.

sharkovsky · April 1, 2022, 4:02pm

Thank you for your comments @clone!

I’ve updated the post to reflect your vote for the first question, and your suggestions for the words (thank you!).

For addressing the reader, is our suggested approach acceptable to you? The idea would be to use infinitive, impersonal and passive forms whenever possible, and address the reader directly only in the exceptional cases where the phrase sounds really weird otherwise. Is that ok for you?

clone · April 1, 2022, 7:55pm

Yep the approach you proposed sounds like a good compromise

CaterinaBi · April 2, 2022, 9:37pm

Hi everyone.

I’d go with option number 3 as well when it comes to how we translate the technical terms, I believe @lewtun is right.

As for the ‘how to approach the reader question’, I believe the trend in translation studies right now would be to use an infinitive: Come fare per…, i.e., forget about addressing the reader directly. All those expressions sound absolutely unnatural in Italian. Another possibility would be to use the 1PP, e.g. ‘Ma come fare se vogliamo…?’. What do you think?

PS. For fine-tune, I’d like to propose ‘ottimizzare’, and I agree that ‘pre-addestrato’ needs the dash.

gnolano · April 3, 2022, 9:41am

Hi everyone.

Thanks for the forum post.

With regards to technical terms, I think I will go with option 3 as well. I also agree with the proposed solution to use the impersonal infinitive to address the reader, which is the generally accepted way to do so.

michimichiamo · April 4, 2022, 10:49am

Hello everyone, I’m happy to join! Let me start by pointing out that I have no technical background in linguistics, thus my opinions mainly come from my academic experience in the field of ML/DL and my (mostly personal) understanding of the common use of the Italian language.

In response to @sharkovsky and its first post (btw, thank you for kicking off the work!):

Technical terms: I would actually go for a sort of 4th approach (let’s say the inverse of point 3), let me firstly explain why. On the one hand, the real purpose of performing a translation is to make the content accessible to people, regardless of their familiarity with the source language. On the other hand, I think that the translation of technical content should reflect the original purpose, rather than mimicking the contained text: virtually no one uses translated terms in this field, and I don’t think it would be beneficial to insert them in the course. In practice, no one is going to talk about “apprendimento automatico” instead of “Machine Learning” in technical contexts, in the same way no one is talking about “condivisione delle macchine” instead of “car sharing” in everyday situations: a standard has already imposed, and I would not force our translation away from it. Besides, it does not improve readability nor comprehension, because the main gap is technical rather than linguistic (I mean, “apprendimento automatico” does not convey any more information than “Machine Learning” to a beginner). I would therefore give the translation for the sake of completeness (and coherence with the main goal of translation), maybe next to the first occurrence of the terms or by linking to the vocabulary/glossary, but I would opt to always go with original terms such as “Machine Learning” for the purpose of actual text content.
How to address the reader: I agree with this proposed solution by @CaterinaBi using the 1PP (‘Ma come fare se vogliamo…?’), since I generally agree with clone (not able to quote, sorry!).
Vocabulary: as already pointed out, I wouldn’t go for a translation of very widespread technical terms such as Machine Learning (see first point). As for “fine-tune” I don’t agree with @CaterinaBi since ‘ottimizzare’ could be misleading in the ML context (optimization algorithms such as ADAM, optimization techniques such as Local Search), while I think that “affinare” conveys the correct message and is not open to misunderstandings (maybe “affinare l’addestramento” if the clarification is needed). As for “large dataset” I would go with “grande insieme di dati”. I agree with the remaining proposed translations as reported in the current version of the table.

Please bear in mind that mine are “layman’s opinions”, meaning that they are the result of my personal experience alone. Anyway, I will be glad to comply with whatever the decision of the majority.
P.S. Sorry for the long reply, and thank you everyone for your work!

sharkovsky · April 6, 2022, 1:30pm

Hi everyone,
I seem to be unable to edit/update the post. @lewtun are you able to help me out on this, I am not familiar with the huggingface forum, maybe it’s normal? I need to be able to do it in order to update the vocabulary/glossary.

It seems to me that we’ve reached a decision on:

how to address the reader use the infinitive form “Ma cosa fare per paragonare…” whenever possible, and fall back onto 1PP “Ma cosa fare se vogliamo paragonare…” in exceptional cases.
code comments never translate comments in the code
vocabulary I will update as soon as I figure out how to edit my original post.

The one thing we’re still discussing is technical terms. It seems to me there are two options which are gaining traction, which are kinda opposite:

always translate it in italian, but in the first instance have the english term also associated to it.
always leave it in english, but in the first instance have the italian term also associated to it.

I had not thought about option 2 but now it seems to be the best in my opinion. What does everyone think?

morenolq · April 6, 2022, 3:25pm

I translated to Italian the Educational Toolkit.
I think the best option is the second one in your previous answer. While translating I found out that translating everything into Italian is very strange (while reading again, from an Italian reader’s perspective). I would vote for option 2 just because it can be useful to someone to associate a given term for the first time.

sharkovsky · April 6, 2022, 3:38pm

Hi Moreno,
thank you for sharing your experience!

I would say that with your vote and to ensure consistency with your other translation, it would be best to go for option 2.

Do you have experience with this forum? Do you know why I cannot edit my original post anymore, and who I could contact to resolve this?

Thank you!
Francesco

CaterinaBi · April 6, 2022, 4:42pm

Hello everyone, will any of you please check my translation of chapter1/1.mdx for consistency before I open a pull request as requested by @lewtun? Thanks!

github.com

CaterinaBi/course/blob/main/chapters/it/chapter1/1.mdx

# Introduzione

## Benvenuto/a al corso di 🤗!

<Youtube id="00GKzGyWFEs" />

Questo corso ti insegnerà a eseguire compiti di Natural Language Processing (NLP, *elaborazione del linguaggio naturale*) utilizzando le librerie dell'ecosistema di [Hugging Face](https://huggingface.co/): [🤗 Transformers](https://github.com/huggingface/transformers), [🤗 Datasets](https://github.com/huggingface/datasets), [🤗 Tokenizers](https://github.com/huggingface/tokenizers), e [🤗 Accelerates](https://github.com/huggingface/accelerate). Ti insegneremo anche ad usare il nostro [Hugging Face Hub](https://huggingface.co/models), che è completamente gratuito e senza pubblicità.


## Contenuti

Eccoti un breve riassunto dei contenuti del corso:

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary.svg" alt="Brief overview of the chapters of the course.">
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary-dark.svg" alt="Brief overview of the chapters of the course.">
</div>

- I capitoli da 1 a 4 forniscono un'introduzione ai concetti principali della libreria 🤗 Transformers. Alla fine di questa parte del corso, conoscerai come funzionano i modelli Transformers e saprai come utilizzare un modello della [Hugging Face Hub](https://huggingface.co/models), affinarlo in un dataset, e condividere i tuoi risultati nell'Hub!
- I capitoli da 5 a 8 insegnano le basi degli 🤗 Dataset e degli 🤗 Tokeniser, per poi esplorare alcuni compiti classici di NLP. Alla fine di questa parte, saprai far fronte ai problemi di NLP più comuni in maniera autonoma.

This file has been truncated. show original

morenolq · April 7, 2022, 12:26pm

Unfortunately, I’m not. I think it’s the same platform used by other projects (e.g., PyTorch) but I’ve just used it as a basic user.

sharkovsky · April 7, 2022, 1:35pm

Ciao Caterina!

Intanto complimenti per l’ottimo lavoro! La tua traduzione mi è sembrata molto accurata, precisa e scorrevole! Qui di sotto ti elenco alcuni commenti, come vedrai sono assolutamente minori. Per me una volta corretti i typo potresti tranquillamente mandare il pull request, ma se hai voglia di prendere in considerazione anche i miei altri commenti ne sarò contento

Hai tradotto task con compito. A me piace come traduzione e sono d’accordo che sia giusto tradurre la parola task, quindi appena possibile lo aggiungo al nostro vocabolario di traduzioni standardizzate.
typo: Accelerates non dovrebbe avere la s finale
Hai assegnato il genere femminile a Hugging Face Hub. Ovviamente è totalmente arbitrario, però a me suona meglio maschile… Che ne dici?
typo: uyna volta hai scrittoTokeniser (alla british english). Mi sembra che huggingface segua più che altro american english, quindi Tokenizer.
aggiungo anche demo alle parole da non tradurre, mi sembra una buona idea!
Hai lasciato il genitivo sassone a fast.ai’s Practical Deep Learning for Coders, ma lo toglierei così: il Practical Deep Learning for Coders di fast.ai

Grazie ancora!
Francesco

CaterinaBi · April 7, 2022, 2:13pm

Grazie, Francesco!

Sono d’accordo su tutto. Ho corretto e ora invio la…richiesta di estrazione?? Fa strano detto così.

Un saluto e grazie a te!

Caterina

CaterinaBi · April 7, 2022, 2:19pm

Hello @lewtun, I’ve officially just opened my first pull request ever (!!!) for 1.mdx.

I hope all is good!