You’re on the right path with multi-lingual, multi-task models in the finance domain. I’ve built Triskel Data a curated archive of high-value, structured legal and financial datasets like:
- CourtListener (legal rulings)
- SEC filings (fully extracted)
- Federal Register (regulatory history)
- AI patent datasets
All cleaned and tokenization-ready in .jsonl
format — not raw scrapes.
A Developer Tier is available with limited access for serious users. While not free, it’s accessible enough to get started without the typical scraping or cleanup burden.