AI Driven Synthetic Custom Datasets for Finance and Citizen Science

:rocket: Custom Synthetic Datasets for Finance & Citizen Science Applications

Hi everyone! :waving_hand:

I’m Emmitt from Grandma’s Boy Labs, and I’m excited to share a new project that might be of interest to folks building and fine-tuning LLMs for specialized domains like finance and healthcare.

:light_bulb: What I’m Building

I’ve been developing high-quality synthetically generated conversational datasets using GPT-based roleplay simulations. These datasets are created through structured prompt engineering, where models simulate realistic expert-client conversations based on career personality profiles, domain knowledge, and user intent.

Current Focus Areas:

:chart_increasing: Finance & Investment Advising

  • Portfolio management strategy discussions
  • Risk tolerance assessments
  • Client education (e.g., explaining ETFs, diversification, etc.)
  • Investment advising scenarios (new investors, retirees, etc.)

:dna: Citizen Science & Public Health

  • Maternal health Q&A simulations (based on real patient education needs)
  • Community-driven knowledge-building examples
  • Accessible and diverse synthetic dialogue for low-resource domains

These datasets are designed to be modular, scalable, and adaptable to any model with API access (not limited to GPT—bring your own LLM!).


:gear: How the Datasets Are Made

Using a blend of:

  • :brain: Role-based personas with detailed career profiles
  • :toolbox: Prompt chains for guided conversation structure
  • :test_tube: Dialogue simulations for specific use cases
  • :bar_chart: Annotated outputs (for fine-tuning, QA, or supervised RL)

Each dataset is formatted in .json with structured fields for:

  • speaker
  • message
  • topic
  • turn_index
  • tags (e.g., “client education”, “risk profiling”)

:floppy_disk: Availability

You can browse and purchase the datasets directly from grandmasboylabs.com. I offer:

  • One-time dataset purchases
  • Full documentation and metadata on generation process
  • Licensing for commercial and open-source fine-tuning

:test_tube: Coming Soon: Model Demo

I’m also working on a Hugging Face Space demo of a fine-tuned investment advising model trained on one of the early datasets. Users will be able to:

  • Try a simulated intake form
  • Interact with the model in real time
  • Explore how the dataset translates to fine-tuned behavior

:handshake: Open to Collaboration

I’d love to connect with:

  • Researchers working on synthetic data evaluation
  • Practitioners fine-tuning models for verticals like finance or health
  • Developers building personalized advisory tools with LLMs

If you’re curious about the data, want to collaborate, or just want to chat about synthetic datasets—reach out! I’d love to hear your thoughts and feedback.

Looking forward to connecting!