While studying various aspects of training and testing and using LLM s it has been on top of my mind to escavate the starting point and the route to building a LLM which is Multi Lingual does tasks like generation,summarization,sentiment, NER and is fine tuned to the Finance,Management and Business domain something like the Bloomberg internal use LLM which I read about. FinGPT, FinLlama FinBERT come to my mind as starting points which can then be made multi lingual. From scratch training is not an option for an individual and even multi lingual plus finance domain with multi task may also be a big task dont know if there is any model close to these capabilities which can be fine tuned/
1 Like
Recent ordinary LLMs may not be so bad at financial knowledge either.
1 Like
You’re on the right path with multi-lingual, multi-task models in the finance domain. I’ve built Triskel Data a curated archive of high-value, structured legal and financial datasets like:
- CourtListener (legal rulings)
- SEC filings (fully extracted)
- Federal Register (regulatory history)
- AI patent datasets
All cleaned and tokenization-ready in .jsonl
format — not raw scrapes.
A Developer Tier is available with limited access for serious users. While not free, it’s accessible enough to get started without the typical scraping or cleanup burden.
1 Like