Custom Synthetic Datasets for Finance & Citizen Science Applications
Hi everyone!
I’m Emmitt from Grandma’s Boy Labs, and I’m excited to share a new project that might be of interest to folks building and fine-tuning LLMs for specialized domains like finance and healthcare.
What I’m Building
I’ve been developing high-quality synthetically generated conversational datasets using GPT-based roleplay simulations. These datasets are created through structured prompt engineering, where models simulate realistic expert-client conversations based on career personality profiles, domain knowledge, and user intent.
Current Focus Areas:
Finance & Investment Advising
- Portfolio management strategy discussions
- Risk tolerance assessments
- Client education (e.g., explaining ETFs, diversification, etc.)
- Investment advising scenarios (new investors, retirees, etc.)
Citizen Science & Public Health
- Maternal health Q&A simulations (based on real patient education needs)
- Community-driven knowledge-building examples
- Accessible and diverse synthetic dialogue for low-resource domains
These datasets are designed to be modular, scalable, and adaptable to any model with API access (not limited to GPT—bring your own LLM!).
How the Datasets Are Made
Using a blend of:
Role-based personas with detailed career profiles
Prompt chains for guided conversation structure
Dialogue simulations for specific use cases
Annotated outputs (for fine-tuning, QA, or supervised RL)
Each dataset is formatted in .json
with structured fields for:
speaker
message
topic
turn_index
tags
(e.g., “client education”, “risk profiling”)
Availability
You can browse and purchase the datasets directly from grandmasboylabs.com. I offer:
- One-time dataset purchases
- Full documentation and metadata on generation process
- Licensing for commercial and open-source fine-tuning
Coming Soon: Model Demo
I’m also working on a Hugging Face Space demo of a fine-tuned investment advising model trained on one of the early datasets. Users will be able to:
- Try a simulated intake form
- Interact with the model in real time
- Explore how the dataset translates to fine-tuned behavior
Open to Collaboration
I’d love to connect with:
- Researchers working on synthetic data evaluation
- Practitioners fine-tuning models for verticals like finance or health
- Developers building personalized advisory tools with LLMs
If you’re curious about the data, want to collaborate, or just want to chat about synthetic datasets—reach out! I’d love to hear your thoughts and feedback.
Looking forward to connecting!