Pipeline output no longer matches the provided example

cpasquer · February 17, 2025, 3:39pm

Hi,

I used to successfully run the rbawden/modern_french_normalisation model to convert Old French to modern French, but recently, the output has become incorrect.

Steps to reproduce (code provided on the Hugging Face page):

from transformers import pipeline
normaliser = pipeline(model="rbawden/modern_french_normalisation", batch_size=32, beam_size=5, cache_file="./cache.pickle", trust_remote_code=True)
                                              
list_inputs = ["Elle haïſſoit particulierement le Cardinal de Lorraine;", "Adieu, i'iray chez vous tantoſt vous rendre grace."]
list_outputs = normaliser(list_inputs)
print(list_outputs)

Expected output (as shown on the Hugging Face page that I was able to obtain previously):

[{'text': 'Elle haïssait particulièrement le Cardinal de Lorraine;',

'alignment': [([0, 4], [0, 4]), ([4, 5], [4, 5]), ([5, 13], [5, 13]), ([13, 14], [13, 14]), ([14, 30], [14, 30]), ([30, 31], [30, 31]), ([31, 33], [31, 33]), ([33, 34], [33, 34]), ([34, 42], [34, 42]), ([42, 43], [42, 43]), ([43, 45], [43, 45]), ([45, 46], [45, 46]), ([46, 54], [46, 54]), ([54, 55], [54, 55])]}, 

{'text': "Adieu, j'irai chez vous tantôt vous rendre grâce.", 

'alignment': [([0, 5], [0, 5]), ([5, 6], [5, 6]), ([6, 7], [6, 7]), ([7, 9], [7, 9]), ([9, 13], [9, 13]), ([13, 14], [13, 14]), ([14, 18], [14, 18]), ([18, 19], [18, 19]), ([19, 23], [19, 23]), ([23, 24], [23, 24]), ([24, 31], [24, 30]), ([31, 32], [30, 31]), ([32, 36], [31, 35]), ([36, 37], [35, 36]), ([37, 43], [36, 42]), ([43, 44], [42, 43]), ([44, 49], [43, 48]), ([49, 50], [48, 49])]}]

Now, I am now getting errors in the output (“haïssoit” instead of “haïssait”, “grace” instead of “grâce”).

Additionally, when using the following input:

["Le Loup ne fut pas longtems à arriver à la maiſon de la Mere-grand, il heurte: Toc, toc, qui eſt là?"]

the output is really strange, with missing/repeated words :
'Le Loup ne fut pas longtems à arriver Fr la maison ne fut Pas-grand, est heurte ne Toc, long, qui est là?'

When loading the pipeline, I also notice these warning messages:

- pipeline.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.```

as well as:

```Some weights of FSMTForConditionalGeneration were not initialized from the model checkpoint at rbawden/modern_french_normalisation and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.```


I don’t recall whether these messages were displayed when the pipeline was working correctly.

Thanks for any help !

John6666 · February 17, 2025, 5:02pm

Is it because of the changes made in January…?
Maybe they fixed a bug and another bug appeared. It’s best to contact the author in the Discussion section.

Topic		Replies	Views
Mistral 7B RAG Langchaing Models	0	2619	February 20, 2024
Different outputs when using pipeline Intermediate	2	1230	July 20, 2023
Completely different results for model in pipeline and by itself Beginners	2	1626	February 23, 2024
Inconsistency in Model Output [ Token Classification] 🤗Transformers	0	333	April 12, 2023
Customizing pipeline problems Beginners	0	299	August 10, 2022

Pipeline output no longer matches the provided example

Related topics