How to solve factual inconsistency when fine tuning

I fine tuned the pszemraj/led-large-book-summary model on financial reports and was able to get a rouge-1 score close to 0.6

But after manually checking each model generated summary with the actual financial report, it turned out that most key information specially numbers are wrong.

And i compared the summary got from openai GPT4 using same financial report and all the numbers and key information was correct in it.

I would like to know your experiences, workarounds and ideas on this issue and how to handle factual inconsistency.

Thanks

Seems like you are asian . I don’t know anything about hugging face I wanted a help . I am from Nepal and I wanted to use hugging face but the problem is I am confused with hugging face billing . Pls help me :

My dollar card is provided by my local bank, where my bank account is created. Even though it is prepaid it is under the visa category and I guess Hugging Face accepts visa cards. Even though that card is not linked with my bank account directly. I understand hugging face has banned prepaid cards due to fraud issues but my dollar card is legal and is only provided by my bank unless we have account in that particular bank in order to create an account there are strict policies so there is no way a scammer will be able to create that particular dollar card ? Will hugging face accept my card? I guess the billing card system is managed by Stripe and even though stripe is not available in my country I hope I can pay through my Dollar Card. Please help me I beg you…

Which card are you using could you please tell me and which country?

Im using huggingface models locally so no need to pay anything since they are free.

Oh so how much limit is for hugging face free tier / local?

I want to create code generator type of tool from user prompt . Can it be possible in free tier ? What are the limitations actually I am new in hugging face could you help me with it .

The “factual” inconsistency that you have is tough to enforce to the model when fine tuning it.
According to me, such pattern might come from the architecture itself which is related to LongFormer. LongFormer uses sliding window and “global” representation of text when computing the attention as explained in the paper : pszemraj/led-large-book-summary

This kind of architecture may compress a lot of token level information into a more global one.

For your task, I would recommend using a small LLM or any model in the category text summarization that relies on the “vanilla” transformers architecture and already pretrained and/or fine tuned on financial datasets that are more likely to contain numbers !

1 Like

no limit if you are download and run the models, datasets in your device. And there is free spaces to run your application in huggingface.
You need to pay only if you are hosting models and use GPU to run them.

this course will give a brief intro to huggingface.

Thank you. I had tried out other models like T5, Pegasus, llama and mistrals. They are factually accurate than the LED models specially with numbers. But after fine tuning these models summary rouge scores were less than the LED’s. And the LED had the best human like summary with a 16k context size.

But as you said will try out smaller models with 1B, 1.5B parameters.

hey !

Maybe give a try to this one : human-centered-summarization/financial-summarization-pegasus

Most of the financial data is numbers linked so it may be worth !

1 Like

Thank you again :blue_heart:

And after referring that model found this another fine-tuned version of that model.

will try out this also.